Where does API key go in Google Cloud Vision API? - python

Want to use Google's Cloud Vision API for OCR. Using the python sample code here we have:
def detect_text(path):
"""Detects text in the file."""
client = vision.ImageAnnotatorClient()
with io.open(path, 'rb') as image_file:
content = image_file.read()
image = types.Image(content=content)
response = client.text_detection(image=image)
texts = response.text_annotations
print('Texts:')
for text in texts:
print('\n"{}"'.format(text.description))
vertices = (['({},{})'.format(vertex.x, vertex.y)
for vertex in text.bounding_poly.vertices])
print('bounds: {}'.format(','.join(vertices)))
Where do I put my API key? I (obviously) can't authenticate without it.

From the docs,
If you plan to use a service account with client library code, you need to set an environment variable.
Provide the credentials to your application code by setting the environment variable GOOGLE_APPLICATION_CREDENTIALS. Replace [PATH] with the location of the JSON file you downloaded in the previous step.
For example:
export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/service-account-file.json"
So it looks like you should create a service account, download a credentials file, and set up an environmental variable to point to it.

There are two ways through which you can authenticate
Exporting the credential file as an environment variable. Here is a sample code:
from google.cloud import vision
def get_text_from_image(image_file):
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "./creds/xxxx-xxxxx.json"
try:
# process_image is a method to convert numpy array to bytestream
# (not of interest in this context hence not including it here)
byte_img = process_image_to_bytes(image_file)
client = vision.ImageAnnotatorClient()
image = vision.Image(content=byte_img)
response = client.text_detection(image=image)
texts = response.text_annotations
return texts
except BaseException as e:
print(str(e))
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] is doing the work here
Using oauth2
from google.cloud import vision
from google.oauth2 import service_account
def get_text_from_image(image_file):
creds = service_account.Credentials.from_service_account_file("./creds/xxx-xxxxx.json")
try:
# process_image is a method to convert numpy array to bytestream
# (not of interest in this context hence not including it here)
byte_img = process_image_to_bytes(image_file)
client = vision.ImageAnnotatorClient(credentials=creds)
image = vision.Image(content=byte_img)
response = client.text_detection(image=image)
texts = response.text_annotations
return texts
except BaseException as e:
print(str(e))
Here we are using the google-auth library to create a credential file from the JSON credential file and passing that object to ImageAnnotatorClient for authentication.
Hope these sample snippets helped you

Related

How to use dialogflow Client In Python

what i am trying is to get the response in python
import dialogflow
from google.api_core.exceptions import InvalidArgument
DIALOGFLOW_PROJECT_ID = 'imposing-fx-333333'
DIALOGFLOW_LANGUAGE_CODE = 'en'
GOOGLE_APPLICATION_CREDENTIALS = 'imposing-fx-333333-e6e3cb9e4adb.json'
text_to_be_analyzed = "Hi! I'm David and I'd like to eat some sushi, can you help me?"
session_client = dialogflow.SessionsClient()
session = session_client.session_path(DIALOGFLOW_PROJECT_ID, SESSION_ID)
text_input = dialogflow.types.TextInput(text=text_to_be_analyzed,
language_code=DIALOGFLOW_LANGUAGE_CODE)
query_input = dialogflow.types.QueryInput(text=text_input)
try:
response = session_client.detect_intent(session=session, query_input=query_input)
except InvalidArgument:
raise
print("Query text:", response.query_result.query_text)
print("Detected intent:", response.query_result.intent.display_name)
print("Detected intent confidence:", response.query_result.intent_detection_confidence)
print("Fulfillment text:", response.query_result.fulfillment_text)
And i am getting unable to verify credentials
google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
This is my first question in stackoverflow :) i know i have done many
You need to export Service Account Key (JSON) file from your , and set an environment variable GOOGLE_APPLICATION_CREDENTIALS to the file path of the JSON file that contains your service account key. Then you can make call to dialogflow.
Steps to get Service Account Key:
Make sure you are using Dialogflow v2.
Go to general settings and click on your Service Account. This will redirect you to Google Cloud Platform project’s service account page.
Next step is to create a new key for the service account. Now create a service account and choose JSON as output key. Follow the instructions and a JSON file will be downloaded to your computer. This file will be used as GOOGLE_APPLICATION_CREDENTIALS.
Now in code,
import os
import dialogflow
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/file.json"
project_id = "your_project_id"
session_id = "your_session_id"
language_code = "en"
session_client = dialogflow.SessionsClient()
session = session_client.session_path(project_id, session_id)
text_input = dialogflow.types.TextInput(text=text, language_code=language_code)
query_input = dialogflow.types.QueryInput(text=text_input)
response_dialogflow = session_client.detect_intent(session=session, query_input=query_input)
This one works too in case you want to pick up the file from file system.
Recomended way is using env variables thoguh
import json
from google.cloud import dialogflow_v2
from google.oauth2 import *
session_client = None
dialogflow_key = None
creds_file = "/path/to/json/file.json"
dialogflow_key = json.load(open(creds_file))
credentials = (service_account.Credentials.from_service_account_info(dialogflow_key))
session_client = dialogflow_v2.SessionsClient(credentials=credentials)
print("it works : " + session_client.DEFAULT_ENDPOINT) if session_client is not None
else print("does not work")
I forgot to add the main article sorry...
Here it is :
https://googleapis.dev/python/google-auth/latest/user-guide.html#service-account-private-key-files

Using Textract for OCR locally

I want to extract text from images using Python. (Tessaract lib does not work for me because it requires installation).
I have found boto3 lib and Textract, but I'm having trouble working with it. I'm still new to this. Can you tell me what I need to do in order to run my script correctly.
This is my code:
import cv2
import boto3
import textract
#img = cv2.imread('slika2.jpg') #this is jpg file
with open('slika2.pdf', 'rb') as document:
img = bytearray(document.read())
textract = boto3.client('textract',region_name='us-west-2')
response = textract.detect_document_text(Document={'Bytes': img}). #gives me error
print(response)
When I run this code, I get:
botocore.exceptions.ClientError: An error occurred (InvalidSignatureException) when calling the DetectDocumentText operation: The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.
I have also tried this:
# Document
documentName = "slika2.jpg"
# Read document content
with open(documentName, 'rb') as document:
imageBytes = bytearray(document.read())
# Amazon Textract client
textract = boto3.client('textract',region_name='us-west-2')
# Call Amazon Textract
response = textract.detect_document_text(Document={'Bytes': imageBytes}) #ERROR
#print(response)
# Print detected text
for item in response["Blocks"]:
if item["BlockType"] == "LINE":
print ('\033[94m' + item["Text"] + '\033[0m')
But I get this error:
botocore.exceptions.ClientError: An error occurred (InvalidSignatureException) when calling the DetectDocumentText operation: The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.
Im noob in this, so any help would be good. How can I read text form my image or pdf file?
I have also added this block of code, but the error is still Unable to locate credentials.
session = boto3.Session(
aws_access_key_id='xxxxxxxxxxxx',
aws_secret_access_key='yyyyyyyyyyyyyyyyyyyyy'
)
There is problem in passing credentials to boto3. You have to pass the credentials while creating boto3 client.
import boto3
# boto3 client
client = boto3.client(
'textract',
region_name='us-west-2',
aws_access_key_id='xxxxxxx',
aws_secret_access_key='xxxxxxx'
)
# Read image
with open('slika2.png', 'rb') as document:
img = bytearray(document.read())
# Call Amazon Textract
response = client.detect_document_text(
Document={'Bytes': img}
)
# Print detected text
for item in response["Blocks"]:
if item["BlockType"] == "LINE":
print ('\033[94m' + item["Text"] + '\033[0m')
Do note, it is not recommended to hardcode AWS Keys in code. Please refer following this document
https://boto3.amazonaws.com/v1/documentation/api/1.9.42/guide/configuration.html

Google Cloud Storage + Python : Any way to list obj in certain folder in GCS?

I'm going to write a Python program to check if a file is in certain folder of my Google Cloud Storage, the basic idea is to get the list of all objects in a folder, a file name list, then check if the file abc.txt is in the file name list.
Now the problem is, it looks Google only provide the one way to get obj list, which is uri.get_bucket(), see below code which is from https://developers.google.com/storage/docs/gspythonlibrary#listing-objects
uri = boto.storage_uri(DOGS_BUCKET, GOOGLE_STORAGE)
for obj in uri.get_bucket():
print '%s://%s/%s' % (uri.scheme, uri.bucket_name, obj.name)
print ' "%s"' % obj.get_contents_as_string()
The defect of uri.get_bucket() is, it looks it is getting all of the object first, this is what I don't want, I just need get the obj name list of particular folder(e.g gs//mybucket/abc/myfolder) , which should be much quickly.
Could someone help answer? Appreciate every answer!
Update: the below is true for the older "Google API Client Libraries" for Python, but if you're not using that client, prefer the newer "Google Cloud Client Library" for Python ( https://googleapis.dev/python/storage/latest/index.html ). For the newer library, the equivalent to the below code is:
from google.cloud import storage
client = storage.Client()
for blob in client.list_blobs('bucketname', prefix='abc/myfolder'):
print(str(blob))
Answer for older client follows.
You may find it easier to work with the JSON API, which has a full-featured Python client. It has a function for listing objects that takes a prefix parameter, which you could use to check for a certain directory and its children in this manner:
from apiclient import discovery
# Auth goes here if necessary. Create authorized http object...
client = discovery.build('storage', 'v1') # add http=whatever param if auth
request = client.objects().list(
bucket="mybucket",
prefix="abc/myfolder")
while request is not None:
response = request.execute()
print json.dumps(response, indent=2)
request = request.list_next(request, response)
Fuller documentation of the list call is here: https://developers.google.com/storage/docs/json_api/v1/objects/list
And the Google Python API client is documented here:
https://code.google.com/p/google-api-python-client/
This worked for me:
client = storage.Client()
BUCKET_NAME = 'DEMO_BUCKET'
bucket = client.get_bucket(BUCKET_NAME)
blobs = bucket.list_blobs()
for blob in blobs:
print(blob.name)
The list_blobs() method will return an iterator used to find blobs in the bucket.
Now you can iterate over blobs and access every object in the bucket. In this example I just print out the name of the object.
This documentation helped me alot:
https://googleapis.github.io/google-cloud-python/latest/storage/blobs.html
https://googleapis.github.io/google-cloud-python/latest/_modules/google/cloud/storage/client.html#Client.bucket
I hope I could help!
You might also want to look at gcloud-python and documentation.
from gcloud import storage
connection = storage.get_connection(project_name, email, private_key_path)
bucket = connection.get_bucket('my-bucket')
for key in bucket:
if key.name == 'abc.txt':
print 'Found it!'
break
However, you might be better off just checking if the file exists:
if 'abc.txt' in bucket:
print 'Found it!'
Install python package google-cloud-storage by pip or pycharm and use below code
from google.cloud import storage
client = storage.Client()
for blob in client.list_blobs(BUCKET_NAME, prefix=FOLDER_NAME):
print(str(blob))
I know this is an old question, but I stumbled over this because I was looking for the exact same answer. Answers from Brandon Yarbrough and Abhijit worked for me, but I wanted to get into more detail.
When you run this:
from google.cloud import storage
storage_client = storage.Client()
blobs = list(storage_client.list_blobs(bucket_name, prefix=PREFIX, fields="items(name)"))
You will get Blob objects, with just the name field of all files in the given bucket, like this:
[<Blob: BUCKET_NAME, PREFIX, None>,
<Blob: xml-BUCKET_NAME, [PREFIX]claim_757325.json, None>,
<Blob: xml-BUCKET_NAME, [PREFIX]claim_757390.json, None>,
...]
If you are like me and you want to 1) filter out the first item in the list because it does NOT represent a file - its just the prefix, 2) just get the name string value, and 3) remove the PREFIX from the file name, you can do something like this:
blob_names = [blob_name.name[len(PREFIX):] for blob_name in blobs if blob_name.name != folder_name]
Complete code to get just the string files names from a storage bucket:
from google.cloud import storage
storage_client = storage.Client()
blobs = list(storage_client.list_blobs(bucket_name, prefix=PREFIX, fields="items(name)"))
blob_names = [blob_name.name[len(PREFIX):] for blob_name in blobs if blob_name.name != folder_name]
print(f"blob_names = {blob_names}")

Python: Upload a photo to photobucket

Can a Python script upload a photo to photo bucket and then retrieve the URL for it? Is so how?
I found a script at this link: http://www.democraticunderground.com/discuss/duboard.php?az=view_all&address=240x677
But I just found that confusing.
many thanks,
Phil
Yes, you can. Photobucket has a well-documented API, and someone wrote a wrapper around it.
Download the it and put it into your Python path, then download httplib2 (you can use easy_install or pip for this one).
Then, you have to request a key for the Photobucket API.
If you did everything right, you can write your script now. The Python wrapper is great, but is not documented at all which makes it very difficult to use it. I spent hours on understanding it (compare the question and response time here). As example, the script even has form/multipart support, but I had to read the code to find out how to use it. I had to prefix the filename with a #.
This library is a great example how you should NOT document your code!
I finally got it working, enjoy the script: (it even has oAuth handling!)
import pbapi
import webbrowser
import cPickle
import os
import re
import sys
from xml.etree import ElementTree
__author__ = "leoluk"
###############################################
## CONFIGURATION ##
###############################################
# File in which the oAuth token will be stored
TOKEN_FILE = "token.txt"
IMAGE_PATH = r"D:\Eigene Dateien\Bilder\SC\foo.png"
IMAGE_RECORD = {
"type": 'image',
"uploadfile": '#'+IMAGE_PATH,
"title": "My title", # <---
"description": "My description", # <---
}
ALBUM_NAME = None # default album if None
API_KEY = "149[..]"
API_SECRET = "528[...]"
###############################################
## SCRIPT ##
###############################################
api = pbapi.PbApi(API_KEY, API_SECRET)
api.pb_request.connection.cache = None
# Test if service online
api.reset().ping().post()
result = api.reset().ping().post().response_string
ET = ElementTree.fromstring(result)
if ET.find('status').text != 'OK':
sys.stderr.write("error: Ping failed \n"+result)
sys.exit(-1)
try:
# If there is already a saved oAuth token, no need for a new one
api.username, api.pb_request.oauth_token = cPickle.load(open(TOKEN_FILE))
except (ValueError, KeyError, IOError, TypeError):
# If error, there's no valid oAuth token
# Getting request token
api.reset().login().request().post().load_token_from_response()
# Requesting user permission (you have to login with your account)
webbrowser.open_new_tab(api.login_url)
raw_input("Press Enter when you finished access permission. ")
#Getting oAuth token
api.reset().login().access().post().load_token_from_response()
# This is needed for getting the right subdomain
infos = api.reset().album(api.username).url().get().response_string
ET = ElementTree.fromstring(infos)
if ET.find('status').text != 'OK':
# Remove the invalid oAuth
os.remove(TOKEN_FILE)
# This happend is user deletes the oAuth permission online
sys.stderr.write("error: Permission deleted. Please re-run.")
sys.exit(-1)
# Fresh values for username and subdomain
api.username = ET.find('content/username').text
api.set_subdomain(ET.find('content/subdomain/api').text)
# Default album name
if not ALBUM_NAME:
ALBUM_NAME = api.username
# Debug :-)
print "User: %s" % api.username
# Save the new, valid oAuth token
cPickle.dump((api.username, api.oauth_token), open(TOKEN_FILE, 'w'))
# Posting the image
result = (api.reset().album(ALBUM_NAME).
upload(IMAGE_RECORD).post().response_string)
ET = ElementTree.fromstring(result)
if ET.find('status').text != 'OK':
sys.stderr.write("error: File upload failed \n"+result)
sys.exit(-1)
# Now, as an example what you could do now, open the image in the browser
webbrowser.open_new_tab(ET.find('content/browseurl').text)
Use the python API by Ron White that was written to do just this

Download a spreadsheet from Google Drive / Workspace using Python

Can you produce a Python example of how to download a Google Sheets spreadsheet given its key and worksheet ID (gid)? I can't.
I've scoured versions 1, 2 and 3 of the API. I'm having no luck, I can't figure out their compilcated ATOM-like feeds API, the gdata.docs.service.DocsService._DownloadFile private method says that I'm unauthorized, and I don't want to write an entire Google Login authentication system myself. I'm about to stab myself in the face due to frustration.
I have a few spreadsheets and I want to access them like so:
username = 'mygooglelogin#gmail.com'
password = getpass.getpass()
def get_spreadsheet(key, gid=0):
... (help!) ...
for row in get_spreadsheet('5a3c7f7dcee4b4f'):
cell1, cell2, cell3 = row
...
Please save my face.
Update 1: I've tried the following, but no combination of Download() or Export() seems to work. (Docs for DocsService here)
import gdata.docs.service
import getpass
import os
import tempfile
import csv
def get_csv(file_path):
return csv.reader(file(file_path).readlines())
def get_spreadsheet(key, gid=0):
gd_client = gdata.docs.service.DocsService()
gd_client.email = 'xxxxxxxxx#gmail.com'
gd_client.password = getpass.getpass()
gd_client.ssl = False
gd_client.source = "My Fancy Spreadsheet Downloader"
gd_client.ProgrammaticLogin()
file_path = tempfile.mktemp(suffix='.csv')
uri = 'http://docs.google.com/feeds/documents/private/full/%s' % key
try:
entry = gd_client.GetDocumentListEntry(uri)
# XXXX - The following dies with RequestError "Unauthorized"
gd_client.Download(entry, file_path)
return get_csv(file_path)
finally:
try:
os.remove(file_path)
except OSError:
pass
The https://github.com/burnash/gspread library is a newer, simpler way to interact with Google Spreadsheets, rather than the old answers to this that suggest the gdata library which is not only too low-level, but is also overly-complicated.
You will also need to create and download (in JSON format) a Service Account key: https://console.developers.google.com/apis/credentials/serviceaccountkey
Here's an example of how to use it:
import csv
import gspread
from oauth2client.service_account import ServiceAccountCredentials
scope = ['https://spreadsheets.google.com/feeds']
credentials = ServiceAccountCredentials.from_json_keyfile_name('credentials.json', scope)
docid = "0zjVQXjJixf-SdGpLKnJtcmQhNjVUTk1hNTRpc0x5b9c"
client = gspread.authorize(credentials)
spreadsheet = client.open_by_key(docid)
for i, worksheet in enumerate(spreadsheet.worksheets()):
filename = docid + '-worksheet' + str(i) + '.csv'
with open(filename, 'wb') as f:
writer = csv.writer(f)
writer.writerows(worksheet.get_all_values())
In case anyone comes across this looking for a quick fix, here's another (currently) working solution that doesn't rely on the gdata client library:
#!/usr/bin/python
import re, urllib, urllib2
class Spreadsheet(object):
def __init__(self, key):
super(Spreadsheet, self).__init__()
self.key = key
class Client(object):
def __init__(self, email, password):
super(Client, self).__init__()
self.email = email
self.password = password
def _get_auth_token(self, email, password, source, service):
url = "https://www.google.com/accounts/ClientLogin"
params = {
"Email": email, "Passwd": password,
"service": service,
"accountType": "HOSTED_OR_GOOGLE",
"source": source
}
req = urllib2.Request(url, urllib.urlencode(params))
return re.findall(r"Auth=(.*)", urllib2.urlopen(req).read())[0]
def get_auth_token(self):
source = type(self).__name__
return self._get_auth_token(self.email, self.password, source, service="wise")
def download(self, spreadsheet, gid=0, format="csv"):
url_format = "https://spreadsheets.google.com/feeds/download/spreadsheets/Export?key=%s&exportFormat=%s&gid=%i"
headers = {
"Authorization": "GoogleLogin auth=" + self.get_auth_token(),
"GData-Version": "3.0"
}
req = urllib2.Request(url_format % (spreadsheet.key, format, gid), headers=headers)
return urllib2.urlopen(req)
if __name__ == "__main__":
import getpass
import csv
email = "" # (your email here)
password = getpass.getpass()
spreadsheet_id = "" # (spreadsheet id here)
# Create client and spreadsheet objects
gs = Client(email, password)
ss = Spreadsheet(spreadsheet_id)
# Request a file-like object containing the spreadsheet's contents
csv_file = gs.download(ss)
# Parse as CSV and print the rows
for row in csv.reader(csv_file):
print ", ".join(row)
You might try using the AuthSub method described in the Exporting Spreadsheets section of the documentation.
Get a separate login token for the spreadsheets service and substitue that for the export. Adding this to the get_spreadsheet code worked for me:
import gdata.spreadsheet.service
def get_spreadsheet(key, gid=0):
# ...
spreadsheets_client = gdata.spreadsheet.service.SpreadsheetsService()
spreadsheets_client.email = gd_client.email
spreadsheets_client.password = gd_client.password
spreadsheets_client.source = "My Fancy Spreadsheet Downloader"
spreadsheets_client.ProgrammaticLogin()
# ...
entry = gd_client.GetDocumentListEntry(uri)
docs_auth_token = gd_client.GetClientLoginToken()
gd_client.SetClientLoginToken(spreadsheets_client.GetClientLoginToken())
gd_client.Export(entry, file_path)
gd_client.SetClientLoginToken(docs_auth_token) # reset the DocList auth token
Notice I also used Export, as Download seems to give only PDF files.
(Jul 2016) All other answers are pretty much outdated or will be, either because they use GData ("Google Data") Protocol, ClientLogin, or AuthSub, all of which have been deprecated. The same is true for all code or libraries that use the Google Sheets API v3 or older.
Modern Google API access occurs using API keys (for accessing public data), OAuth2 client IDs (for accessing data owned by users), or service accounts (for accessing data owned by applications/in the cloud) primarily with the Google Cloud client libraries for GCP APIs and Google APIs Client Libraries for non-GCP APIs. For this task, it would be the latter for Python.
To make it happen your code needs authorized access to the Google Drive API, perhaps to query for specific Sheets to download, and then to perform the actual export(s). Since this is likely a common operation, I wrote a blogpost sharing a code snippet that does this for you. If you wish to pursue this even more, I've got another pair of posts along with a video that outlines how to upload files to and download files from Google Drive.
Note that there is also a Google Sheets API v4, but it's primarily for spreadsheet-oriented operations, i.e., inserting data, reading spreadsheet rows, cell formatting, creating charts, adding pivot tables, etc., not file-based request like exporting where the Drive API is the correct one to use.
I wrote a blog post that demos exporting a Google Sheet as CSV from Drive. The core part of the script:
# setup
FILENAME = 'inventory'
SRC_MIMETYPE = 'application/vnd.google-apps.spreadsheet'
DST_MIMETYPE = 'text/csv'
DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http()))
# query for file to export
files = DRIVE.files().list(
q='name="%s" and mimeType="%s"' % (FILENAME, SRC_MIMETYPE), orderBy='modifiedTime desc,name').execute().get('files', [])
# export 1st match (if found)
if files:
fn = '%s.csv' % os.path.splitext(files[0]['name'].replace(' ', '_'))[0]
print('Exporting "%s" as "%s"... ' % (files[0]['name'], fn), end='')
data = DRIVE.files().export(fileId=files[0]['id'], mimeType=DST_MIMETYPE).execute()
if data:
with open(fn, 'wb') as f:
f.write(data)
print('DONE')
To learn more about using Google Sheets with Python, see my answer for a similar question. You can also download a Sheet in XLSX and other formats supported by Drive.
If you're completely new to Google APIs, then you need to take a further step back and review these videos first:
How to use Google APIs & create API projects -- the UI has changed but the concepts are still the same
Walkthrough of authorization boilerplate code (Python) -- you can use any supported language to access Google APIs; if you don't do Python, use it as pseudocode to help get you started
Listing your files in Google Drive and code deep dive post
If you already have experience with Google Workspace (formerly G Suite, Google Apps, Google "Docs") APIs and want to see more videos on using both APIs:
Sheets API video library
Drive API video library
Google Workspace (G Suite) Dev Show video series I produced
This no longer works as of gdata 2.0.1.4:
gd_client.SetClientLoginToken(spreadsheets_client.GetClientLoginToken())
Instead, you have to do:
gd_client.SetClientLoginToken(gdata.gauth.ClientLoginToken(spreadsheets_client.GetClientLoginToken()))
I wrote pygsheets as an alternative to gspread, but using google api v4. It has an export method to export spreadsheet.
import pygsheets
gc = pygsheets.authorize()
# Open spreadsheet and then workseet
sh = gc.open('my new ssheet')
wks = sh.sheet1
#export as csv
wks.export(pygsheets.ExportType.CSV)
The following code works in my case (Ubuntu 10.4, python 2.6.5 gdata 2.0.14)
import gdata.docs.service
import gdata.spreadsheet.service
gd_client = gdata.docs.service.DocsService()
gd_client.ClientLogin(email,password)
spreadsheets_client = gdata.spreadsheet.service.SpreadsheetsService()
spreadsheets_client.ClientLogin(email,password)
#...
file_path = file_path.strip()+".xls"
docs_token = gd_client.auth_token
gd_client.SetClientLoginToken(spreadsheets_client.GetClientLoginToken())
gd_client.Export(entry, file_path)
gd_client.auth_token = docs_token
I've simplified #Cameron's answer even further, by removing the unnecessary object orientation. This makes the code smaller and easier to understand. I also edited the url, which might work better.
#!/usr/bin/python
import re, urllib, urllib2
def get_auth_token(email, password):
url = "https://www.google.com/accounts/ClientLogin"
params = {
"Email": email, "Passwd": password,
"service": 'wise',
"accountType": "HOSTED_OR_GOOGLE",
"source": 'Client'
}
req = urllib2.Request(url, urllib.urlencode(params))
return re.findall(r"Auth=(.*)", urllib2.urlopen(req).read())[0]
def download(spreadsheet, worksheet, email, password, format="csv"):
url_format = 'https://docs.google.com/spreadsheets/d/%s/export?exportFormat=%s#gid=%s'
headers = {
"Authorization": "GoogleLogin auth=" + get_auth_token(email, password),
"GData-Version": "3.0"
}
req = urllib2.Request(url_format % (spreadsheet, format, worksheet), headers=headers)
return urllib2.urlopen(req)
if __name__ == "__main__":
import getpass
import csv
spreadsheet_id = "" # (spreadsheet id here)
worksheet_id = '' # (gid here)
email = "" # (your email here)
password = getpass.getpass()
# Request a file-like object containing the spreadsheet's contents
csv_file = download(spreadsheet_id, worksheet_id, email, password)
# Parse as CSV and print the rows
for row in csv.reader(csv_file):
print ", ".join(row)
I'm using this:
curl 'https://docs.google.com/spreadsheets/d/1-lqLuYJyHAKix-T8NR8wV8ZUUbVOJrZTysccid2-ycs/gviz/tq?tqx=out:csv' on a sheet that is set to publicly readable.
So you would need a python version of curl, if you can work with public sheets.
If you have a sheet with some tabs you don't want to reveal, create a new sheet, and import the ranges you want to publish into tabs on it.
Downloading a spreadsheet from google doc is pretty simple using sheets.
You can follow the detailed documentation on
https://pypi.org/project/gsheets/
or follow the below-given steps. I recommend reading through the documentation for better coverage.
pip install gsheets
Log in to the Google Developers Console with the Google account whose spreadsheets you want to access. Create (or select) a project and enable the Drive API and Sheets API (under Google Apps APIs).
Go to the Credentials for your project and create New credentials > OAuth client ID > of type Other. In the list of your OAuth 2.0 client IDs click Download JSON for the Client ID you just created. Save the file as client_secrets.json in your home directory (user directory).
Use the following code snippet.
from gsheets import Sheets
sheets = Sheets.from_files('client_secret.json')
print(sheets) # will ensure authenticate connection
s = sheets.get("{SPREADSHEET_URL}")
print(s) # will ensure your file is accessible
s.sheets[1].to_csv('Spam.csv', encoding='utf-8', dialect='excel') # will download the file as csv
This isn't a complete answer, but Andreas Kahler wrote up an interesting CMS solution using Google Docs + Google App Engline + Python. Not having any experience in the area, I cannot see exactly what portion of the code may be of use to you, but check it out. I know it interfaces with a Google Docs account and plays with files, so I have a feeling you'll recognize what's going on. It should at least point you in the right direction.
Google AppEngine + Google Docs + Some Python = Simple CMS
Gspread is indeed a big improvement over GoogleCL and Gdata (both of which I've used and thankfully phased out in favor of Gspread). I think that this code is even quicker than the earlier answer to get the contents of the sheet:
username = 'sdfsdfsds#gmail.com'
password = 'sdfsdfsadfsdw'
sheetname = "Sheety Sheet"
client = gspread.login(username, password)
spreadsheet = client.open(sheetname)
worksheet = spreadsheet.sheet1
contents = []
for rows in worksheet.get_all_values():
contents.append(rows)
(Mar 2019, Python 3) My data is usually not sensitive and I use usually table format similar to CSV.
In such case, one can simply publish to the web the sheet and than use it as a CSV file on a server.
(One publishes it using File -> Publish to the web ... -> Sheet 1 -> Comma separated values (.csv) -> Publish).
import csv
import io
import requests
url = "https://docs.google.com/spreadsheets/d/e/<GOOGLE_ID>/pub?gid=0&single=true&output=csv" # you can get the whole link in the 'Publish to the web' dialog
r = requests.get(url)
r.encoding = 'utf-8'
csvio = io.StringIO(r.text, newline="")
data = []
for row in csv.DictReader(csvio):
data.append(row)

Categories

Resources