I am trying to build a google sheets based chat with python and I'm having trouble understanding how to read&write from the spread sheet on my drive (without using Google API of course, explanation why at the end*)
So far I've gotten to a place where I can get the file, but I cant read the content. Like so:
import pandas as pd
import requests
from io import StringIO
orig_url='https://docs.google.com/spreadsheets/d/1bnCDl1DqRLqO8xHx3sjWdkydYC7rEb3vjpXUZ3ps2tY/edit?usp=sharing'
file_id = orig_url.split('/')[-2]
dwn_url='https://drive.google.com/uc?export=download&id=' + file_id
url = requests.get(dwn_url).text
csv_raw = StringIO(url)
dfs = pd.read_csv(csv_raw)
print(dfs.head())
P.S. I looked online for many other resources, from I can tell they are all using google API
*I am building the chat app as part of a course and using API's is not a part of it yet, for that reason I cannot use google API
Please refer to the Google api documentation specifically Google sheets API: https://developers.google.com/sheets/api
There you can read about how to enable the sheets api, create authentication tokens and how to use the api calls in your preferred language.
If you don't want to use Google API, you can try sheetdb.io. It's a tool that turns a Google Spreadsheet into a JSON API. So you can use HTTP requests to read and write to a Google Spreadsheet.
Related
I am using gspread library in my python script to connect to the Google excel sheet and I was wondering if there is a way to also grab the version history via gspread library?
From I am using gspread library in my python script to connect to the Google excel sheet, in your situation, I guessed that Google excel sheet is Google Spreadsheet.
About I was wondering if there is a way to also grab the version history via gspread library?, unfortunately, in the current stage, it seems that gspread has no methods for retrieving the revision list of Spreadsheet. But, when googleapis for python is used, the revision list of the Spreadsheet can be retrieved.
In this answer, I would like to propose to retrieve the revision list of Spreadsheet using the client of gspread. Because, in the recent version of gspread, googlapis can be easily used using the client of gspread. And also, gspread includes the scope for using Drive API. I thought that when the client of gspread is used, it might be useful for your situation. The sample script is as follows.
Sample script:
import gspread
from googleapiclient.discovery import build
client = gspread.oauth(
credentials_filename="###", # Please set your file.
authorized_user_filename="###", # Please set your file.
)
spreadsheetId = "###" # Please set your Spreadsheet ID.
service = build("drive", "v3", credentials=client.auth)
revisions = service.revisions().list(fileId=spreadsheetId).execute()
print(revisions)
When this script is run, the revision list can be retrieved from the Spreadsheet.
Note:
For example, when you want to access the data of the specific version, I thought that the following threads might be useful.
Google Drive API V3: get the content of a revision
How to get older versions of Google Spreadsheet data?
Revert Revision of an Excel File - Drive API
Reference:
Revisions: list
In R we can read a private google sheet given its URL simply with two lines of code.
library(googlesheets4)
manifest <- read_sheet(url)
The library googlesheets4 takes care of the authentication, where to store the information etc. and loads everything automatically into a table.
How can I do something similar with python?
import pandas as pd
import package as pck # Some package
pandas_dataframe = pck.read(sheetURL)
Is there a python package that does this? Ideally it would take care of authentication.
The most straightforward and optimized way to open spreadsheets in Python is to use the gspread module as follows:
import gspread
gc = gspread.service_account()
sh = gc.open("SampleSheet")
print(sh.sheet1.get('Column1'))
However, you need to authenticate your Google account in order to access your own data sheets properly. So you can follow this documentation to set up a development environment that you can use to process data from Google into Python.
Also, you have the alternative to using Google Colab notebooks and easily access all of the data you have stored on Drive easily without much coding.
I am referring to below Google Drive api to export Google spreadsheet as CSV.
https://developers.google.com/drive/v2/web/manage-downloads
Its working fine But as mentioned in guide it downloads only 1st sheet into csv format. I am looking for a way to download all worksheets into csv format separately.
gdata library of python is not working after OAuth1 has has been deprecated.
Please suggest if someone has done it successfully in OAuth2.
Drive API itself does not offer a way to export a specific worksheet, but with a valid bearer token you can just download it in the desired format via its Google Drive url. The pattern is https://docs.google.com/spreadsheets/d/{{document_id}}/export?format=tsv&gid={{gid}}. The document_id and gid are visiable in the browser URL bar when you open the sheet in Google Drive. Replace tsv with whatever format you need.
I'm not good enough in Python, but I created a demo app in Node.js: git#github.com:joerx/drive-exporter.git. The general flow is:
Obtain an access token via the Google APIs OAuth2 flow
Figure out sheet_id and gid
Construct download url
Make a regular HTTP request passing the token as Authorization: Bearer {{token}}
For public sheets you can skip the authorisation part.
General documentation for using Drive API and OAuth2 (with examples in Python) is here
If you need to programmatically determine the gid, this might help: How to convert Google spreadsheet's worksheet string id to integer index (GID)?
I just started using Gspread and am trying to access one of my google docs spreadsheets in my google drive. I followed the instructions and went to Google API console and created a JSON file. When I run this code I get no errors:
import gspread
import json
from oauth2client.client import SignedJwtAssertionCredentials
json_key = json.load(open("mwsSearch-b3d5d5d9c956.json"))
scope = ["https://spreadsheets.google.com/feeds"]
credentials = SignedJwtAssertionCredentials(json_key['client_email'], bytes(json_key['private_key'], 'utf-8'), scope=scope)
gc = gspread.authorize(credentials)
My next step is to try to open a google spreadsheet. I created a spreadsheet titled "Mike" in the main folder of my Google Drive, but have tried to access it via:
gc.open_by_url("https://docs.google.com/spreadsheets/d/1HH7BKsnB2Rd5rlAr7S2H3avUtO4GkqeWrXJfYKKooNA/edit#gid=0")
gc.open("Mike")
gc.open_by_key("1HH7BKsnB2Rd5rlAr7S2H3avUtO4GkqeWrXJfYKKooNA")
All three of these return the same error:
gspread.exceptions.SpreadsheetNotFound
I am thinking that maybe the api access is linking to another cloud storage through the project, and not my individual google drive, and that is why it is not accessing it. Could someone with more experience in this please point me in the right direction on what I am doing wrong. All help is appreciated. Thank you.
You'll need to add the email which was created with the JSON key to the spreadsheet you want to access. It will be something like 9876.....#developer.gserviceaccount.com. You'll find it as the "client email" in your JSON file and your credential page.
SignedJwtAssertionCredentials has been deprecated.
Look at http://gspread.readthedocs.org/en/latest/oauth2.html
Go to Google Sheets and share your spreadsheet with an email you have in your json_key['client_email']. Otherwise you’ll get a SpreadsheetNotFound exception when trying to open it.
I am attempting to read the raw text/content of a Google Doc (just a plain document, not a spreadsheet or presentation) from within a Python script, but so far have had little success.
Here's what I've tried:
import gdata.docs.service
client = gdata.docs.service.DocsService()
client.ClientLogin('email', 'password')
q = gdata.docs.service.DocumentQuery()
q.AddNamedFolder('email', 'Folder Name')
feed = client.Query(q.ToUri())
doc = feed.entry[0] # extract one of the documents
However, this variable doc, which is of type gdata.docs.DocumentListEntry, doesn't seem to contain any content, just meta information about the document.
Am I doing something wrong here? Can somebody point me in the right direction? Thank you!
UPDATE (Mar 2019) Good news! The Google Docs REST API is now available. More info about it from my SO answer to a similar question, but to get you going, here's the official Python "quickstart" sample showing you how to get the title of a Google Doc in plain text.
Both the Apps Script and Drive REST API solutions originally answered below are still valid and are alternate ways to get the contents of a Google Doc. (The Drive API works on both Python 2 & 3, but Apps Script is JavaScript-only.)
Bottom-line: if you want to download the entire Doc in plain text, the Drive API solution is best. If you want to programmatically CRUD different parts of a Doc, then you must use either the Docs API or Apps Script.
(Feb 2017) The code in the OP and the only other answer are both now out-of-date as ClientLogin authentication was deprecated back in 2012(!), and GData APIs are the previous generation of Google APIs. While not all GData APIs have been deprecated, all newer Google APIs do not use the Google Data protocol.
There isn't a REST API available (at this time) for Google Docs documents, although there is an "API-like" service provided by Google Apps Script, the JavaScript-in-the-cloud solution which provides programmatic access to Google Docs (via its DocumentService object), including Docs add-ons.
To read plain text from a Google Doc, considered file-level access, you would use the Google Drive API instead. Examples of using the Drive API:
Exporting a Google Sheet as CSV (blog post)
"Poor man's plain text to PDF" converter (blog post) (*)
(*) - TL;DR: upload plain text file to Drive, import/convert to Google Docs format, then export that Doc as PDF. Post above uses Drive API v2; this follow-up post describes migrating it to Drive API v3, and here's a developer video combining both "poor man's converter" posts.
The solution to the OP is to perform similar operations as what you see in both posts above but ensure you're using the text/plain export MIMEtype. For other import/export formats to/from Drive, see this related question SO answer as well as the downloading files from Drive docs page. Here's some pseudocode that searches for Google Docs documents called "Hello World" in my Drive folder and displays the contents of the first matching file found on-screen (assuming DRIVE is your API service endpoint):
from __future__ import print_function
NAME = 'Hello World'
MIME = 'text/plain'
# using Drive API v3; if using v2, change 'pageSize' to 'maxResults',
# 'name=' to 'title=', and ".get('files')" to ".get('items')"
res = DRIVE.files().list(q="name='%s'" % NAME, pageSize=1).execute().get('files')
if res:
fileID = res[0]['id'] # 1st matching "Hello World" name
res = DRIVE.files().export(fileId=fileID, mimeType=MIME).execute()
if res:
print(res.decode('utf-8')) # decode bytes for Py3; NOP for Py2
If you need more than this, see these videos on how to setup using Google APIs, OAuth2 authorization, and creating a Drive service endpoint to list your Drive files, plus a corresponding blog post for all three.
To learn more about how to use Google APIs with Python in general, check out my blog as well as a variety of Google developer videos (series 1 and series 2) I'm producing.
A DocumentQuery doesn't return you all the documents with their contents—that would take forever. It just returns a list of documents, with metadata about each. (Actually, IIRC you can get a preview page this way, so if your document is only one page that might be enough…)
You then need to download the content in a separate request. The content element has a type (the MIME type) and a src (the URL to the actual data). You can just download that src, and parse it. However, you can override the default type by adding an exportFormat parameter, so you don't need to do any parsing.
See the section Downloading documents and files in the docs, which has an example showing how to download a document and specify a format. (It's in .NET rather than Python, and it uses HTML rather than plain text, but you should be able to figure it out.)