Export data from the Google Sheet to PDF PYTHON - python

I am getting all the data present in google sheet using code below,
i want to write all these data to the pdf file and download that.
import gspread
import sys
print(sys.path)
import os
#sys.path.append('/usr/lib/python3/dist-packages')
from oauth2client.service_account import ServiceAccountCredentials
scope = [
'https://www.googleapis.com/auth/spreadsheets',
'https://www.googleapis.com/auth/drive'
]
path = os.path.abspath('cred.json')
credentials=ServiceAccountCredentials.from_json_keyfile_name('cred.json',scope)
client=gspread.authorize(credentials)
sheet=client.open('xyz').sheet1
data=sheet.get_all_records()
print(data)

I believe your goal as follows.
You want to export Google Spreadsheet of xyz as a PDF file using gspread with python and the service acccount.
Modification points:
Unfortunately, it seems that in the current stage, the Spreadsheet cannot be directly export as a PDF file using gspread. So in this case, requests library and the endpoint for exporting Spreadsheet to PDF are used.
When the points are reflected to your script, it becomes as follows.
Modified script:
import gspread
import sys
print(sys.path)
import os
#sys.path.append('/usr/lib/python3/dist-packages')
from oauth2client.service_account import ServiceAccountCredentials
scope = [
'https://www.googleapis.com/auth/spreadsheets',
'https://www.googleapis.com/auth/drive'
]
path = os.path.abspath('cred.json')
creds=ServiceAccountCredentials.from_json_keyfile_name('cred.json',scope)
client=gspread.authorize(creds)
# I added below script
spreadsheet_name = 'xyz'
spreadsheet = client.open(spreadsheet_name)
url = 'https://docs.google.com/spreadsheets/export?format=pdf&id=' + spreadsheet.id
headers = {'Authorization': 'Bearer ' + creds.create_delegated("").get_access_token().access_token}
res = requests.get(url, headers=headers)
with open(spreadsheet_name + ".pdf", 'wb') as f:
f.write(res.content)
Note:
In this modified script, it supposes that you hav ealready been able to get values from Google Spreadsheet using Sheets API. Please be careful this.
If an error related to Drive API, please enable Drive API at the API console.
If an error related to the service account, please modify create_delegated("") to create_delegated("email of the service account").

Related

Google spreadsheet to Pandas dataframe via Pydrive without download

How do I read the content of a Google spreadsheet into a Pandas dataframe without downloading the file?
I think gspread or df2gspread may be good shots, but I've been working with pydrive so far and got close to the solution.
With Pydrive I managed to get the export link of my spreadsheet, either as .csv or .xlsx file. After the authentication process, this looks like
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)
# choose whether to export csv or xlsx
data_type = 'csv'
# get list of files in folder as dictionaries
file_list = drive.ListFile({'q': "'my-folder-ID' in parents and
trashed=false"}).GetList()
export_key = 'exportLinks'
excel_key = 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
csv_key = 'text/csv'
if data_type == 'excel':
urls = [ file[export_key][excel_key] for file in file_list ]
elif data_type == 'csv':
urls = [ file[export_key][csv_key] for file in file_list ]
The type of url I get for xlsx is
https://docs.google.com/spreadsheets/export?id=my-id&exportFormat=xlsx
and similarly for csv
https://docs.google.com/spreadsheets/export?id=my-id&exportFormat=csv
Now, if I click on these links (or visit them with webbrowser.open(url)), I download the file, that I can then normally read into a Pandas dataframe with pandas.read_excel() or pandas.read_csv(), as described here.
How can I skip the download, and directly read the file into a dataframe from these links?
I tried several solutions:
The obvious pd.read_csv(url) gives
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 6, saw 2
Interestingly these numbers (1, 6, 2) do not depend on the number of rows and columns in my spreadsheet, hinting that the script is trying to read not what it is intended to.
The analogue pd.read_excel(url) gives
ValueError: Excel file format cannot be determined, you must specify an engine manually.
and specifying e.g. engine = 'openpyxl' gives
zipfile.BadZipFile: File is not a zip file
BytesIO solution looked promising, but
r = requests.get(url)
data = r.content
df = pd.read_csv(BytesIO(data))
still gives
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 6, saw 2
If I print(data) I get hundreds of lines of html code
b'\n<!DOCTYPE html>\n<html lang="de">\n <head>\n <meta charset="utf-8">\n <meta content="width=300, initial-scale=1" name="viewport">\n
...
...
</script>\n </body>\n</html>\n'
In your situation, how about the following modification? In this case, by retrieving the access token from gauth, the Spreadsheet is exported as XLSX data, and the XLSX data is put into the dataframe.
Modified script:
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
url = "https://docs.google.com/spreadsheets/export?id={spreadsheetId}&exportFormat=xlsx"
res = requests.get(url, headers={"Authorization": "Bearer " + gauth.attr['credentials'].access_token})
values = pd.read_excel(BytesIO(res.content))
print(values)
In this script, please add import requests.
In this case, the 1st tab of XLSX data is used.
When you want to use the other tab, please modify values = pd.read_excel(BytesIO(res.content)) as follows.
sheet = "Sheet2"
values = pd.read_excel(BytesIO(res.content), sheet_name=sheet)
I want to contribute an additional option to #Tanaike's excellent answer. Indeed it is quite difficult to successfully get an excel file (.xlsx from drive and not a google sheet) into a python environment without publishing the content to the web. Whereas the previous answer uses pydrive and GoogleAuth(), I usually use a different method of authentification in colab/jupyter notebooks. Adapted from googleapis documentation. In my environment using BytesIO(response.content) is unnecessary.
import pandas as pd
from oauth2client.client import GoogleCredentials
from google.colab import auth
auth.authenticate_user()
from google.auth.transport.requests import AuthorizedSession
from google.auth import default
creds, _ = default()
id = 'aaaaaaaaaaaaaaaaaaaaaaaaaaa'
sheet = 'Sheet12345'
url = f'https://docs.google.com/spreadsheets/export?id={id}&exportFormat=xlsx'
authed_session = AuthorizedSession(creds)
response = authed_session.get(url)
values = pd.read_excel(response.content, sheet_name=sheet)

credential JSON file to google colab

I'm trying to move a Python Jupyter scraper script (and json cred file) from my laptop to Google Colab.
I've made a connection between Google Colab and Google Drive.
I've stored the (.ipynb) script and credential JSON file on Google Drive.
However I can't make the connection between the 2 (gdrive json cred file and colab) to make it work.
Here below the part of the script concerning the credentials handling:
# Sheet key
# 1i1bmMt-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx_d7Eo
import gspread
import pandas as pd
import requests
from bs4 import BeautifulSoup
from oauth2client.service_account import ServiceAccountCredentials
# Access credentials for google sheet and access the google sheet
scope = ["https://spreadsheets.google.com/feeds",
"https://www.googleapis.com/auth/spreadsheets",
"https://www.googleapis.com/auth/drive.file",
"https://www.googleapis.com/auth/drive"]
# Copy your path to your credential JSON file.
PATH_TO_CREDENTIAL = '/Users/user/json-keys/client_secret.json'
# Initiate your credential
credentials = ServiceAccountCredentials.from_json_keyfile_name(PATH_TO_CREDENTIAL, scope)
# Authorize your connection to your google sheet
gc = gspread.authorize(credentials)
I receive FileNotFoundError: and credential erros
Hope someone can help me with this, thanks
You try to put the file to the same directory to test it first. Make sure that the file is okay and can run successfully.
Here's the source code for reference:
If client_secret.json is in the same directory as the file you're running, then the correct syntax is:
import os
DIRNAME = os.path.dirname(__file__)
credentials = ServiceAccountCredentials.from_json_keyfile_name(os.path.join(DIRNAME, 'client_secret.json'), scope)
If the above test is okay, then try to move the file to your target directory '/Users/user/json-keys/client_secret.json' and try to create a symbolic link in the current directory to link the client_secret.json file. Then, run the program with the above code to test it again. Make sure it has no problem when putting the file to that directory. It's a workaround.
I used this case for reference to this:
Django not recognizing or seeing JSON file

Read formula in the Google Sheets cells using Python

I am trying to download a Google Sheets document as a Microsoft Excel document using Python. I have been able to accomplish this task using the Python module googleapiclient.
However, the Sheets document may contain some formulas which are not compatible with Microsoft Excel (https://www.dataeverywhere.com/article/27-incompatible-formulas-between-excel-and-google-sheets/).
When I use the application I created on any Google Sheets document that used any of these formulas anywhere, I get a bogus Microsoft Excel document as output.
I would like to read the cell values in the Google Sheets document before downloading it as a Microsoft Excel document, just to prevent any such errors from happening.
The code I have written thus far is attached below:
import sys
import os
from googleapiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools
SCOPES = "https://www.googleapis.com/auth/drive.readonly"
store = file.Storage("./credentials/credentials.json")
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets("credentials/client_secret.json",
SCOPES)
creds = tools.run_flow(flow, store)
DRIVE = discovery.build("drive", "v3", http = creds.authorize(Http()))
print("Usage: tmp.py <name of the spreadsheet>")
FILENAME = sys.argv[1]
SRC_MIMETYPE = "application/vnd.google-apps.spreadsheet"
DST_MIMETYPE = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
files = DRIVE.files().list(
q = 'name="%s" and mimeType="%s"' % (FILENAME, SRC_MIMETYPE),
orderBy = "modifiedTime desc,name").execute().get("files", [])
if files:
fn = '%s.xlsx' % os.path.splitext(files[0]["name"].replace(" ", "_"))[0]
print('Exporting "%s" as "%s"... ' % (files[0]['name'], fn), end = "")
data = DRIVE.files().export(fileId=files[0]['id'], mimeType=DST_MIMETYPE).execute()
if data:
with open(fn, "wb") as f:
f.write(data)
print("Done")
else:
print("ERROR: Could not download file")
else:
print("ERROR: File not found")
If you want to use python to export something from google docs, then the simplest way is to let googles own server do the job for you.
I was doing a little webscraping on google sheets, and I made this little program which will do the job for you. You just have to insert the id of the document you want to download.
I put in a temporary id, so anyone can try it out.
import requests
ext = 'xlsx' #csv, ods, html, tsv and pdf can be used as well
key = '1yEoHh7WL1UNld-cxJh0ZsRmNwf-69uINim2dKrgzsLg'
url = f'https://docs.google.com/spreadsheets/d/{key}/export?format={ext}'
res = requests.get(url)
with open(f'file.{ext}', 'wb') as f:
f.write(res.content)
That way conversion will most certainly always be correct, because this is the same a clicking the export button inside the browser version of google sheets.
If you are planning to work with the data inside python, then I recommend using csv format instead of xlsx, and then create the necessary formulas inside python.
I think the gspread library might be what you are looking for. https://gspread.readthedocs.io/en/latest/
Here's a code sample:
import tenacity
import gspread
from oauth2client.service_account import ServiceAccountCredentials
#tenacity.retry(wait=tenacity.wait_exponential()) # If you exceed the Google API quota, this waits to retry your request
def loadGoogleSheet(spreadsheet_name):
# use creds to create a client to interact with the Google Drive API
print("Connecting to Google API...")
scope = [
'https://spreadsheets.google.com/feeds',
'https://www.googleapis.com/auth/drive'
]
creds = ServiceAccountCredentials.from_json_keyfile_name('client_secret.json', scope)
client = gspread.authorize(creds)
spreadsheet = client.open(spreadsheet_name)
return spreadsheet
def readGoogleSheet(spreadsheet):
sheet = spreadsheet.sheet1 # Might need to loop through sheets or whatever
val = sheet.cell(1, 1).value # This just gets the value of the first cell. The docs I linked to above are pretty helpful on all the other stuff you can do
return val
test_spreadsheet = loadGoogleSheet('Copy of TLO Summary - Template DO NOT EDIT')
test_output = readGoogleSheet(test_spreadsheet)
print(test_output)

python gspread - How to get a spreadsheet URL path in after i create it?

I'm trying to create a new spreadsheet using the gspread python package, then get its URL path (inside the google drive) and send it to other people so they could go in as well.
I tried to find an answer here and here, with no luck.
I created a brand new Spreadsheet:
import gspread
from gspread_dataframe import get_as_dataframe, set_with_dataframe
gc = gspread_connect()
spreadsheet = gc.create('TESTING SHEET')
Then i Shared it with my account:
spreadsheet.share('my_user#my_company.com', perm_type='user', role='writer')
Then i wrote some random stuff into it:
worksheet = gc.open('TESTING SHEET').sheet1
df = pd.DataFrame.from_records([{'a': i, 'b': i * 2} for i in range(100)])
set_with_dataframe(worksheet, df)
Now when i go to my google drive i can find this sheet by looking for its name ("TESTING SHEET")
But i didn't figure how do i get the URL path in my python code, so i could pass it right away to other people.
Tnx!
You can generate the URL by using Spreadsheet.id. Here's an example that uses spreadsheet variable from your code:
spreadsheet_url = "https://docs.google.com/spreadsheets/d/%s" % spreadsheet.id

gspread reading a google sheet file using python 3

I'm using Python 3 running Pycharm and the module gspread to read google sheet files in my google drive. Well, I've followed all steps from this link about how to read a file. Unfortunately, my code here below doesn't work yet.
import gspread
from oauth2client.service_account import ServiceAccountCredentials
scope =['https://docs.google.com/spreadsheets/d/1xnaOZMd2v93tY28h_hsuMnZYXC9YqCfFpQX70lwpN94/edit?usp=sharing']
credentials = ServiceAccountCredentials.from_json_keyfile_name('distech-c1e26e7150b2.json',scope)
gc = gspread.authorize(credentials)
wks = gc.open("POC").sheet1
for temp in wks:
print(temp)
How could I read the google sheet file using this module guys? thanks so much
I got it after a deep research I realize two things.
the scope in my code was wrong cause the scope is just one provided by Google API to grant right access permissions under spreadsheet.
The right scope for me was: scope =['https://spreadsheets.google.com/feeds']
the opener it's just to open the spreadsheet that will return the worksheets within my file.
So solutions thanks to #Pedro Lobito in his post here.
Solution:
I had the same issue with just a couple of spreadsheets on my account, the problem was solved by:
Opening the json key file (projectname-ca997255dada.json)
Find the value of client_email , i.e.: client_email": "278348728734832-compute#developer.gserviceaccount.com
Share your sheet(s) with that email
Now my code looks like:
import gspread
from oauth2client.service_account import ServiceAccountCredentials
scope =['https://spreadsheets.google.com/feeds']
credentials = ServiceAccountCredentials.from_json_keyfile_name('xxx.json',scope)
gc = gspread.authorize(credentials)
spreadsheet = gc.open("POC")
wks = spreadsheet.worksheet('test1')
wks2 = spreadsheet.worksheet('test2')
out = list()
out = wks.col_values(1)
for temp in out:
print(out)

Categories

Resources