Read formula in the Google Sheets cells using Python - python

I am trying to download a Google Sheets document as a Microsoft Excel document using Python. I have been able to accomplish this task using the Python module googleapiclient.
However, the Sheets document may contain some formulas which are not compatible with Microsoft Excel (https://www.dataeverywhere.com/article/27-incompatible-formulas-between-excel-and-google-sheets/).
When I use the application I created on any Google Sheets document that used any of these formulas anywhere, I get a bogus Microsoft Excel document as output.
I would like to read the cell values in the Google Sheets document before downloading it as a Microsoft Excel document, just to prevent any such errors from happening.
The code I have written thus far is attached below:
import sys
import os
from googleapiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools
SCOPES = "https://www.googleapis.com/auth/drive.readonly"
store = file.Storage("./credentials/credentials.json")
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets("credentials/client_secret.json",
SCOPES)
creds = tools.run_flow(flow, store)
DRIVE = discovery.build("drive", "v3", http = creds.authorize(Http()))
print("Usage: tmp.py <name of the spreadsheet>")
FILENAME = sys.argv[1]
SRC_MIMETYPE = "application/vnd.google-apps.spreadsheet"
DST_MIMETYPE = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
files = DRIVE.files().list(
q = 'name="%s" and mimeType="%s"' % (FILENAME, SRC_MIMETYPE),
orderBy = "modifiedTime desc,name").execute().get("files", [])
if files:
fn = '%s.xlsx' % os.path.splitext(files[0]["name"].replace(" ", "_"))[0]
print('Exporting "%s" as "%s"... ' % (files[0]['name'], fn), end = "")
data = DRIVE.files().export(fileId=files[0]['id'], mimeType=DST_MIMETYPE).execute()
if data:
with open(fn, "wb") as f:
f.write(data)
print("Done")
else:
print("ERROR: Could not download file")
else:
print("ERROR: File not found")

If you want to use python to export something from google docs, then the simplest way is to let googles own server do the job for you.
I was doing a little webscraping on google sheets, and I made this little program which will do the job for you. You just have to insert the id of the document you want to download.
I put in a temporary id, so anyone can try it out.
import requests
ext = 'xlsx' #csv, ods, html, tsv and pdf can be used as well
key = '1yEoHh7WL1UNld-cxJh0ZsRmNwf-69uINim2dKrgzsLg'
url = f'https://docs.google.com/spreadsheets/d/{key}/export?format={ext}'
res = requests.get(url)
with open(f'file.{ext}', 'wb') as f:
f.write(res.content)
That way conversion will most certainly always be correct, because this is the same a clicking the export button inside the browser version of google sheets.
If you are planning to work with the data inside python, then I recommend using csv format instead of xlsx, and then create the necessary formulas inside python.

I think the gspread library might be what you are looking for. https://gspread.readthedocs.io/en/latest/
Here's a code sample:
import tenacity
import gspread
from oauth2client.service_account import ServiceAccountCredentials
#tenacity.retry(wait=tenacity.wait_exponential()) # If you exceed the Google API quota, this waits to retry your request
def loadGoogleSheet(spreadsheet_name):
# use creds to create a client to interact with the Google Drive API
print("Connecting to Google API...")
scope = [
'https://spreadsheets.google.com/feeds',
'https://www.googleapis.com/auth/drive'
]
creds = ServiceAccountCredentials.from_json_keyfile_name('client_secret.json', scope)
client = gspread.authorize(creds)
spreadsheet = client.open(spreadsheet_name)
return spreadsheet
def readGoogleSheet(spreadsheet):
sheet = spreadsheet.sheet1 # Might need to loop through sheets or whatever
val = sheet.cell(1, 1).value # This just gets the value of the first cell. The docs I linked to above are pretty helpful on all the other stuff you can do
return val
test_spreadsheet = loadGoogleSheet('Copy of TLO Summary - Template DO NOT EDIT')
test_output = readGoogleSheet(test_spreadsheet)
print(test_output)

Related

Downloading all tabs of a spreadsheet Google Drive API

I'm trying to download the full content of a spreadsheet using google Drive. Currently, my code is exporting and then writing to a file the content from the first tab from the given spreadsheet only. How can I make it download the full content of the file?
This is the function that I'm currently using:
def download_file(real_file_id, service):
try:
file_id = real_file_id
request = service.files().export_media(fileId=file_id,
mimeType='text/csv')
file = io.BytesIO()
downloader = MediaIoBaseDownload(file, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print(F'Download {int(status.progress() * 100)}.')
except HttpError as error:
print(F'An error occurred: {error}')
file = None
file_object = open('test.csv', 'a')
file_object.write(file.getvalue().decode("utf-8"))
file_object.close()
return file.getvalue()
I call the function at a later stage in my code by passing the already initialised google drive service and the file id
download_file(real_file_id='XXXXXXXXXXXXXXXXXXXXX', service=service)
I believe your goal is as follows.
You want to download all sheets in a Google Spreadsheet as CSV data.
You want to achieve this using googleapis for python.
In this case, how about the following sample script? In this case, in order to retrieve the sheet names of each sheet in Google Spreadsheet, Sheets API is used. Using Sheets API, the sheet IDs of all sheets are retrieved. Using these sheet Ids, all sheets are downloaded as CSV data.
Sample script:
From your showing script, I guessed that service might be service = build("drive", "v3", credentials=creds). If my understanding is corret, in order to retrieve the acess token, please use creds.
spreadsheetId = "###" # Please set the Spreadsheet ID.
sheets = build("sheets", "v4", credentials=creds)
sheetObj = sheets.spreadsheets().get(spreadsheetId=spreadsheetId, fields="sheets(properties(sheetId,title))").execute()
accessToken = creds.token
for s in sheetObj.get("sheets", []):
p = s["properties"]
sheetName = p["title"]
print("Download: " + sheetName)
url = "https://docs.google.com/spreadsheets/export?id=" + spreadsheetId + "&exportFormat=csv&gid=" + str(p["sheetId"])
res = requests.get(url, headers={"Authorization": "Bearer " + accessToken})
with open(sheetName + ".csv", mode="wb") as f:
f.write(res.content)
In this case, please add import requests.
When this script is run, all sheets in a Google Spreadsheet are downloaded as CSV data. The filename of each CSV file uses the tab name in Google Spreadsheet.
In this case, please add a scope of "https://www.googleapis.com/auth/spreadsheets.readonly" as follows. And, please reauthorize the scopes. Please be careful about this.
SCOPES = [
"https://www.googleapis.com/auth/drive.readonly", # Please use this for your actual situation.
"https://www.googleapis.com/auth/spreadsheets.readonly",
]
Reference:
Method: spreadsheets.get
Tanaike's answer is easier and more straightforward, but I already spent some time on this so I might as well post it as an alternative.
The problem you originally encountered is that CSV files do not support multiple tabs/sheets, so Drive's files.export will only export the first sheet, and it doesn't have a way to select specific sheets.
Another way you can approach this is to use the Sheets API copyTo() method to create temp files for each sheet and export those as single CSV files.
# need a service for sheets and one for drive
sheetservice = build('sheets', 'v4', credentials=creds)
driveservice = build('drive', 'v3', credentials=creds)
spreadsheet = sheetservice.spreadsheets()
result = spreadsheet.get(spreadsheetId=YOUR_SPREADSHEET).execute()
sheets = result.get('sheets', []) # the list of sheets within your spreadsheet
# standard metadata to create the blank spreadsheet files
file_metadata = {
"name":"temp",
"mimeType":"application/vnd.google-apps.spreadsheet"
}
for sheet in sheets:
# create a blank spreadsheet and get its ID
tempfile = driveservice.files().create(body=file_metadata).execute()
tempid = tempfile.get('id')
# copy the sheet to the new file
sheetservice.spreadsheets().sheets().copyTo(spreadsheetId=YOUR_SPREADSHEET, sheetId=sheet['properties']['sheetId'], body={"destinationSpreadsheetId":tempid}).execute()
# need to delete the first sheet since the copy gets added as second
sheetservice.spreadsheets().batchUpdate(spreadsheetId=tempid, body={"requests":{"deleteSheet":{"sheetId":0}}}).execute()
download_file(tempid, driveservice) # runs your original method to download the file
driveservice.files().delete(fileId=tempid).execute() # to clean up the temp file
You'll also need the https://www.googleapis.com/auth/spreadsheets and https://www.googleapis.com/auth/drive scopes. This involves more API calls so I just recommend Tanaike's method, but I hope it gives you an idea of ways that you can play with the API to suit your needs.

Google spreadsheet to Pandas dataframe via Pydrive without download

How do I read the content of a Google spreadsheet into a Pandas dataframe without downloading the file?
I think gspread or df2gspread may be good shots, but I've been working with pydrive so far and got close to the solution.
With Pydrive I managed to get the export link of my spreadsheet, either as .csv or .xlsx file. After the authentication process, this looks like
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)
# choose whether to export csv or xlsx
data_type = 'csv'
# get list of files in folder as dictionaries
file_list = drive.ListFile({'q': "'my-folder-ID' in parents and
trashed=false"}).GetList()
export_key = 'exportLinks'
excel_key = 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
csv_key = 'text/csv'
if data_type == 'excel':
urls = [ file[export_key][excel_key] for file in file_list ]
elif data_type == 'csv':
urls = [ file[export_key][csv_key] for file in file_list ]
The type of url I get for xlsx is
https://docs.google.com/spreadsheets/export?id=my-id&exportFormat=xlsx
and similarly for csv
https://docs.google.com/spreadsheets/export?id=my-id&exportFormat=csv
Now, if I click on these links (or visit them with webbrowser.open(url)), I download the file, that I can then normally read into a Pandas dataframe with pandas.read_excel() or pandas.read_csv(), as described here.
How can I skip the download, and directly read the file into a dataframe from these links?
I tried several solutions:
The obvious pd.read_csv(url) gives
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 6, saw 2
Interestingly these numbers (1, 6, 2) do not depend on the number of rows and columns in my spreadsheet, hinting that the script is trying to read not what it is intended to.
The analogue pd.read_excel(url) gives
ValueError: Excel file format cannot be determined, you must specify an engine manually.
and specifying e.g. engine = 'openpyxl' gives
zipfile.BadZipFile: File is not a zip file
BytesIO solution looked promising, but
r = requests.get(url)
data = r.content
df = pd.read_csv(BytesIO(data))
still gives
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 6, saw 2
If I print(data) I get hundreds of lines of html code
b'\n<!DOCTYPE html>\n<html lang="de">\n <head>\n <meta charset="utf-8">\n <meta content="width=300, initial-scale=1" name="viewport">\n
...
...
</script>\n </body>\n</html>\n'
In your situation, how about the following modification? In this case, by retrieving the access token from gauth, the Spreadsheet is exported as XLSX data, and the XLSX data is put into the dataframe.
Modified script:
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
url = "https://docs.google.com/spreadsheets/export?id={spreadsheetId}&exportFormat=xlsx"
res = requests.get(url, headers={"Authorization": "Bearer " + gauth.attr['credentials'].access_token})
values = pd.read_excel(BytesIO(res.content))
print(values)
In this script, please add import requests.
In this case, the 1st tab of XLSX data is used.
When you want to use the other tab, please modify values = pd.read_excel(BytesIO(res.content)) as follows.
sheet = "Sheet2"
values = pd.read_excel(BytesIO(res.content), sheet_name=sheet)
I want to contribute an additional option to #Tanaike's excellent answer. Indeed it is quite difficult to successfully get an excel file (.xlsx from drive and not a google sheet) into a python environment without publishing the content to the web. Whereas the previous answer uses pydrive and GoogleAuth(), I usually use a different method of authentification in colab/jupyter notebooks. Adapted from googleapis documentation. In my environment using BytesIO(response.content) is unnecessary.
import pandas as pd
from oauth2client.client import GoogleCredentials
from google.colab import auth
auth.authenticate_user()
from google.auth.transport.requests import AuthorizedSession
from google.auth import default
creds, _ = default()
id = 'aaaaaaaaaaaaaaaaaaaaaaaaaaa'
sheet = 'Sheet12345'
url = f'https://docs.google.com/spreadsheets/export?id={id}&exportFormat=xlsx'
authed_session = AuthorizedSession(creds)
response = authed_session.get(url)
values = pd.read_excel(response.content, sheet_name=sheet)

Export data from the Google Sheet to PDF PYTHON

I am getting all the data present in google sheet using code below,
i want to write all these data to the pdf file and download that.
import gspread
import sys
print(sys.path)
import os
#sys.path.append('/usr/lib/python3/dist-packages')
from oauth2client.service_account import ServiceAccountCredentials
scope = [
'https://www.googleapis.com/auth/spreadsheets',
'https://www.googleapis.com/auth/drive'
]
path = os.path.abspath('cred.json')
credentials=ServiceAccountCredentials.from_json_keyfile_name('cred.json',scope)
client=gspread.authorize(credentials)
sheet=client.open('xyz').sheet1
data=sheet.get_all_records()
print(data)
I believe your goal as follows.
You want to export Google Spreadsheet of xyz as a PDF file using gspread with python and the service acccount.
Modification points:
Unfortunately, it seems that in the current stage, the Spreadsheet cannot be directly export as a PDF file using gspread. So in this case, requests library and the endpoint for exporting Spreadsheet to PDF are used.
When the points are reflected to your script, it becomes as follows.
Modified script:
import gspread
import sys
print(sys.path)
import os
#sys.path.append('/usr/lib/python3/dist-packages')
from oauth2client.service_account import ServiceAccountCredentials
scope = [
'https://www.googleapis.com/auth/spreadsheets',
'https://www.googleapis.com/auth/drive'
]
path = os.path.abspath('cred.json')
creds=ServiceAccountCredentials.from_json_keyfile_name('cred.json',scope)
client=gspread.authorize(creds)
# I added below script
spreadsheet_name = 'xyz'
spreadsheet = client.open(spreadsheet_name)
url = 'https://docs.google.com/spreadsheets/export?format=pdf&id=' + spreadsheet.id
headers = {'Authorization': 'Bearer ' + creds.create_delegated("").get_access_token().access_token}
res = requests.get(url, headers=headers)
with open(spreadsheet_name + ".pdf", 'wb') as f:
f.write(res.content)
Note:
In this modified script, it supposes that you hav ealready been able to get values from Google Spreadsheet using Sheets API. Please be careful this.
If an error related to Drive API, please enable Drive API at the API console.
If an error related to the service account, please modify create_delegated("") to create_delegated("email of the service account").

JSON to Google sheets using Python

I am trying to find a way to transfer the values from JSON to a Google Sheet.
The values in JSON file are something like this {"someone_name1#gmail.com": 4, "someone_name2.com": 4} and they keep updating in the different run of the script. However, I am getting an error when its trying to put the value in the sheet.
sheet4.update_cells(1, 1, results)
TypeError: update_cells() takes from 2 to 3 positional arguments but 4 were given
Here is the code below. Any ideas what am I doing wrong and how can I fix this? I tried researching this but not able to find a suitable answer. As I am new to coding and python, I am unable to figure this one out. Any help appreciated. :)
# all functions imported
import gspread
from oauth2client.service_account import ServiceAccountCredentials
import json
from collections import Counter
# login & open sheet sheets
scope = ["https://spreadsheets.google.com/feeds", 'https://www.googleapis.com/auth/spreadsheets',
"https://www.googleapis.com/auth/drive.file", "https://www.googleapis.com/auth/drive"]
credentials = ServiceAccountCredentials.from_json_keyfile_name('myfile-b16b15370c5b.json', scope)
client = gspread.authorize(credentials)
sheet4 = client.open('Dashboard').worksheet('Sheet4') # Open the spreadsheet
counter_file_path = "counter.json"
with open(counter_file_path, "r") as f:
email_stats = json.load(f)
results = []
for key in email_stats:
results.append([key, email_stats[key]])
sheet4.update_cells(1, 1, results)

Getting File Metadata from Google API V3 in Python

I am trying to retrieve file metadata from Google drive API V3 in Python. I did it in API V2, but failed in V3.
I tried to get metadata by this line:
data = DRIVE.files().get(fileId=file['id']).execute()
but all I got was a dict of 'id', 'kind', 'name', and 'mimeType'. How can I get 'md5Checksum', 'fileSize', and so on?
I read the documentation.
I am supposed to get all the metadata by get() methods, but all I got was a small part of it.
Here is my code:
from __future__ import print_function
import os
from apiclient.discovery import build
from httplib2 import Http
from oauth2client import file, client, tools
try:
import argparse
flags = argparse.ArgumentParser(parents=[tools.argparser]).parse_args()
except ImportError:
flags = None
SCOPES = 'https://www.googleapis.com/auth/drive.metadata
https://www.googleapis.com/auth/drive'
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets('storage.json', scope=SCOPES)
creds = tools.run_flow(flow, store)
DRIVE = build('drive','v3', http=creds.authorize(Http()))
files = DRIVE.files().list().execute().get('files',[])
for file in files:
print('\n',file['name'],file['id'])
data = DRIVE.files().get(fileId=file['id']).execute()
print('\n',data)
print('Done')
I tried this answer:
Google Drive API v3 Migration
List
Files returned by service.files().list() do not contain information now, i.e. every field is null. If you want list on v3 to behave like in v2, call it like this:
service.files().list().setFields("nextPageToken, files");
but I get a Traceback:
DRIVE.files().list().setFields("nextPageToken, files")
AttributeError: 'HttpRequest' object has no attribute 'setFields'
Suppose you want to get the md5 hash of a file given its fileId, you can do it like this:
DRIVE = build('drive','v3', http=creds.authorize(Http()))
file_service = DRIVE.files()
remote_file_hash = file_service.get(fileId=fileId, fields="md5Checksum").execute()['md5Checksum']
To list some files on the Drive:
results = file_service.list(pageSize=10, fields="files(id, name)").execute()
I have built a small application gDrive-auto-sync containing more examples of API usage.
It's well-documented, so you can have a look at it if you want.
Here is the main file containing all the code. It might look like a lot but more than half of lines are just comments.
If you want to retrieve all the fields for a file resource, simply set fields='*'
In your above example, you would run
data = DRIVE.files().get(fileId=file['id'], fields='*').execute()
This should return all the available resources for the file as listed in:
https://developers.google.com/drive/v3/reference/files
There is a library PyDrive that provide easy interactions with google drive
https://googledrive.github.io/PyDrive/docs/build/html/filelist.html
Their example:
from pydrive.drive import GoogleDrive
drive = GoogleDrive(gauth) # Create GoogleDrive instance with authenticated GoogleAuth instance
# Auto-iterate through all files in the root folder.
file_list = drive.ListFile({'q': "'root' in parents and trashed=false"}).GetList()
for file1 in file_list:
print('title: %s, id: %s' % (file1['title'], file1['id']))
All you need is file1['your key']

Categories

Resources