I have a table (as a Pandas DF) of (mostly) github repos, for which I need to automatically extract the LICENSE link. However, it is a requirement that the link does not just simply go to the /blob/master/ but actually points to a specific commit as the master link might be updated at some point. I assembled a Python script to do this through the github API, but using the API I am only able to retrieve the link with the master tag.
I.e. instead of
https://github.com/jsdom/abab/blob/master/LICENSE.md
I want
https://github.com/jsdom/abab/blob/8abc2aa5b1378e59d61dee1face7341a155d5805/LICENSE.md
Any idea if there is a way to automatically get the link to the latest commit for a file, in this case the LICENSE file?
This is the code I have written so far:
def githubcrawl(repo_url, session, headers):
parts = repo_url.split("/")[3:]
url_tmpl = "http://api.github.com/repos/{}/license"
url = url_tmpl.format("/".join(parts))
try:
response = session.get(url, headers=headers)
if response.status_code in [404]:
return(f"404: {repo_url}")
else:
data = json.loads(response.text)
return(data["html_url"]) # Returns the html URL to LICENSE file
except urllib.error.HTTPError as e:
print(repo_url, "-", e)
return f"http_error: {repo_url}"
token="mytoken" # Token for github authentication to get more requests per hour
headers={"Authorization": "token %s" % token}
session = requests.Session()
lizlinks = [] # List to store the links of the LICENSE files in
# iterate over DataFrame of applications/deps
for idx, row in df.iterrows():
# if idx < 5:
if type(row["Homepage"]) == type("str"):
repo_url = re.sub(r"\#readme", "", row["Homepage"])
response = session.get(repo_url, headers=headers)
repo_url = response.url # Some URLs are just redirects, so I get the actual repo url here
if "github" in repo_url and len(repo_url.split("/")) >= 3:
link = githubcrawl(repo_url, session, headers)
print(link)
lizlinks.append(link)
else:
print(row["Homepage"], "Not a github Repo")
lizlinks.append("Not a github repo")
else:
print(row["Homepage"], "Not a github Repo")
lizlinks.append("Not a github repo")
Bonus-Question: Would parallelizing this task work with the Github-API? I.e. could I send multiple requests at once without being locked out (DoS) or is the for-loop a good approach to avoid this? It takes quite a while to go through the 1000ish of repos I have in that list.
Ok, I found a way to get the unique SHA-hash of the current commit. I believe that should always link to the license file of that point in time.
Using the python git library, i simply run the ls_remote git command and return the HEAD sha
def lsremote_HEAD(url):
g = git.cmd.Git()
HEAD_sha = g.ls_remote(url).split()[0]
return HEAD_sha
I can then replace the "master", "main" or whatever tag in my github_crawl function:
token="token_string"
headers={"Authorization": "token %s" % token}
session = requests.Session()
def githubcrawl(repo_url, session, headers):
parts = repo_url.split("/")[3:]
api_url_tmpl = "http://api.github.com/repos/{}/license"
api_url = api_url_tmpl.format("/".join(parts))
try:
print(api_url)
response = session.get(api_url, headers=headers)
if response.status_code in [404]:
return(f"404: {repo_url}")
else:
data = json.loads(response.text)
commit_link = re.sub(r"/blob/.+?/",rf"/blob/{lsremote_HEAD(repo_url)}/", data["html_url"])
return(commit_link)
except urllib.error.HTTPError as e:
print(repo_url, "-", e)
return f"http_error: {repo_url}"
Maybe this helps someone, so I'm posting this answer here.
This answer uses the following libraries:
import re
import git
import urllib
import json
import requests
I am able to get a token using the default scope for powerbi ( "scope" :
"https://analysis.windows.net/powerbi/api/.default"] )
With that token, I am able to read the workspaces my user has access to, ( "https://api.powerbi.com/v1.0/myorg/groups") and the reports information inside each of those workspaces (
"https://api.powerbi.com/v1.0/myorg/reports/")
But it does not matter if I reuse the same token or just acquire a brand new, if I try to export a specific report, I got a 401 error code. This is the way I am issuing the requests.get
token_ = <new token or reused from previous get requests>
reports = requests.get( # Use token to call downstream service
config['reports']+report_id+'/Export',
headers={'Authorization': 'Bearer ' + token_ },)
Now, if I go to https://learn.microsoft.com/en-us/rest/api/power-bi/reports/getreportsingroup
and sign in (with the same user I am using on my python script). Get the token from that page and use it on my script. It works, If I use it in postman, it works
I I try to use the token acquired by my script in Postman, I also get a 401 error. So, yes, my script is not getting the correct token for this particular entry point but it is good enough for the groups and reports entry point.
Is there anything I need to add to the request for the token on this particular entry point?
Thank you very much,
Andres
Here is the full script I am using, there is also a params.json that looks like this:
{
"authority": "https://login.microsoftonline.com/1abcdefg-abcd-48b6-9b3c-bd5123456",
"client_id": "5d2545-abcd-4765-8fbb-53555f2fa91",
"username":"myusername#tenant",
"password": "mypass",
"scope" : ["https://analysis.windows.net/powerbi/api/.default"],
"workspaces" : "https://api.powerbi.com/v1.0/myorg/groups",
"reports": "https://api.powerbi.com/v1.0/myorg/reports/"
}
#script based on msal github library sample
import sys # For simplicity, we'll read config file from 1st CLI param sys.argv[1]
import json
import logging
import requests
import msal
def exportReport(report_id,token_):
result = app.acquire_token_by_username_password( config["username"], config["password"], scopes=config["scope"])
token_ = result['access_token']
print(f'Using token: {token_}')
reports = requests.get( # Use token to call downstream service
config['reports']+report_id+'/Export',
headers={'Authorization': 'Bearer ' + token_ },)
print(f'-reports: {reports.status_code}')
def list_reports(workspace_id,ws_id,ws_name,token_):
print(f'reports id for workspace {ws_name}')
for rp in workspace_id['value']:
if rp["id"] == "1d509119-76a1-42ce-8afd-bd3c420dd62d":
exportReport("1d509119-76a1-42ce-8afd-bd0c420dd62d",token_)
def list_workspaces(workspaces_dict):
for ws in workspaces_dict['value']:
yield (ws['id'],ws['name'])
config = json.load(open('params.json'))
app = msal.PublicClientApplication(
config["client_id"], authority=config["authority"],
)
result = None
if not result:
logging.info("No suitable token exists in cache. Let's get a new one from AAD.")
result = app.acquire_token_by_username_password(
config["username"], config["password"], scopes=config["scope"])
if "access_token" in result:
workspaces = requests.get( # Use token to call downstream service
config['workspaces'],
headers={'Authorization': 'Bearer ' + result['access_token']},).json()
ids=list_workspaces(workspaces) #prepare workspace generator
headers = {'Authorization': 'Bearer ' + result['access_token']}
while True:
try:
ws_id,ws_name=next(ids)
reports = requests.get( # Use token to call downstream service
config['workspaces']+'/'+ws_id+'/reports',
headers={'Authorization': 'Bearer ' + result['access_token']},).json()
list_reports(reports,ws_id,ws_name,result['access_token'])
except StopIteration:
exit(0)
else:
print(result.get("error"))
print(result.get("error_description"))
print(result.get("correlation_id")) # You may need this when reporting a bug
if 65001 in result.get("error_codes", []):
# AAD requires user consent for U/P flow
print("Visit this to consent:", app.get_authorization_request_url(config["scope"]))
From your description, I suppose you didn't grant the correct permission for your AD App used to login your user account in the code, please follow the steps below.
Navigate to the Azure portal -> Azure Active Directory -> App registrations -> find your AD App used in the code(filter with All applications) -> API permissions -> add the Report.Read.All Delegated permission in Power BI Service API(this permission is just for read action, if you need some further write operation, choose Report.ReadWrite.All) -> click the Grant admin consent for xxx button at last.
Update:
Use the application id of the access token got from Get-PowerBIAccessToken solve the issue.
I am attempting to create a scheduled backup of datastore via my Python Flask application (Python 3) to cloud storage. I am comfortable with the scheduling aspect of it however am having difficulty with the export.
I was using https://cloud.google.com/datastore/docs/schedule-export as a starting point however it references
from google.appengine.api import urlfetch
which is no longer supported. I have been looking into urllib
import urllib.request
url = 'https://datastore.googleapis.com/v1/projects/application-name-placeholder'
timestamp = datetime.datetime.now().strftime('%Y%m%d-%H%M%S')
output_url_prefix = 'gs://datastore-backup-test-name-placeholder/example'
query = client.query(kind='__kind__')
query.keys_only()
kinds = [entity.key.id_or_name for entity in query.fetch()]
query = client.query(kind='__namespace__')
query.keys_only()
all_namespaces = [entity.key.id_or_name for entity in query.fetch()]
entity_filter = {
'kinds': kinds,
'namespace_ids': all_namespaces
}
request = {
'project_id': 'application-name-placeholder',
'output_url_prefix': output_url_prefix,
'entity_filter': entity_filter
}
headers = {
'Content-Type': 'application/json'
}
response = urllib.request.Request(url)
response.add_header('Content-type', 'application/json')
result = urllib.request.urlopen(response, data=bytes(json.dumps(request), encoding="utf-8"))
At the moment I am getting
urllib.error.HTTPError: HTTP Error 404: Not Found
Not sure if my url for datastore is the correct approach but think there are other issues with my approach. Some guidance would be appreciated.
In your URL you have url = 'https://datastore.googleapis.com/v1/projects/application-name-placeholder', but linked documentation has url = 'https://datastore.googleapis.com/v1/projects/%s:export' % app_id. You are missing the trailing :export.
Given that you are trying to export your whole database, you should remove your entity filter. Without an entity filter the managed export will export your entire database.
I am kind of newbie to REST and testing dept. I needed to write automation scripts to test our REST services.We are planning to run these scripts from a Jenkins CI job regularly. I prefer writing these in python as we already have UI functionality testing scripts in python generated by selenium IDE, but I am open to any good solution.I checked httplib,simplejson and Xunit, but looking for better solutions available out there.
And also, I would prefer to write a template and generate actual script for each REST API by reading api info from xml or something. Advance thanks to all advices.
I usually use Cucumber to test my restful APIs. The following example is in Ruby, but could easily be translated to python using either the rubypy gem or lettuce.
Start with a set of RESTful base steps:
When /^I send a GET request for "([^\"]*)"$/ do |path|
get path
end
When /^I send a POST request to "([^\"]*)" with the following:$/ do |path, body|
post path, body
end
When /^I send a PUT request to "([^\"]*)" with the following:$/ do |path, body|
put path, body
end
When /^I send a DELETE request to "([^\"]*)"$/ do |path|
delete path
end
Then /^the response should be "([^\"]*)"$/ do |status|
last_response.status.should == status.to_i
end
Then /^the response JSON should be:$/ do |body|
JSON.parse(last_response.body).should == JSON.parse(body)
end
And now we can write features that test the API by actually issuing the requests.
Feature: The users endpoints
Scenario: Creating a user
When I send a POST request to "/users" with the following:
"""
{ "name": "Swift", "status": "awesome" }
"""
Then the response should be "200"
Scenario: Listing users
Given I send a POST request to "/users" with the following:
"""
{ "name": "Swift", "status": "awesome" }
"""
When I send a GET request for "/users"
Then the response should be "200"
And the response JSON should be:
"""
[{ "name": "Swift", "status": "awesome" }]
"""
... etc ...
These are easy to run on a CI system of your choice. See these links for references:
http://www.anthonyeden.com/2010/11/testing-rest-apis-with-cucumber-and-rack-test/
http://jeffkreeftmeijer.com/2011/the-pain-of-json-api-testing/
http://www.cheezyworld.com/2011/08/09/running-your-cukes-in-jenkins/
import openpyxl
import requests
import json
from requests.auth import HTTPBasicAuth
urlHead='https://IP_ADDRESS_HOST:PORT_NUMBER/'
rowStartAt=2
apiColumn=2
#payloadColumn=3
responseBodyColumn=12
statusCodeColumn=13
headerTypes = {'Content-Type':'application/json',
'Accept':'application/json',
'Authorization': '23324'
}
wb = openpyxl.load_workbook('Excel_WORKBOOK.xlsx')
# PROCESS EACH SHEET
for sheetName in (wb.get_sheet_names()):
print ('Sheet Name = ' + sheetName)
flagVar = input('Enter N To avoid APIs Sheets')
if (flagVar=='N'):
print ('Sheet got skipped')
continue
#get a sheet
sheetObj = wb.get_sheet_by_name(sheetName)
#for each sheet iterate the API's
for i in range(2, sheetObj.max_row+1):
#below is API with method type
apiFromSheet = (sheetObj.cell(row=i, column=apiColumn).value)
if apiFromSheet is None:
continue
#print (i, apiFromSheet)
#Let's split the api
apiType = apiFromSheet.split()[0]
method = apiFromSheet.split()[1]
if (apiType!='GET'):
continue
#lets process GET API's
absPath = urlHead + method
print ("REQUESTED TYPE AND PATH = ", apiType, absPath)
print('\n')
res = requests.get(absPath, auth=HTTPBasicAuth(user, pwd), verify=False, headers=headerTypes)
#LET's write res body into relevant cell
sheetObj.cell(row=i, column=responseBodyColumn).value = (res.text)
sheetObj.cell(row=i, column=statusCodeColumn).value = (res.status_code)
wb.save('Excel_WORKBOOK.xlsx')
`#exit(0)`
Can you produce a Python example of how to download a Google Sheets spreadsheet given its key and worksheet ID (gid)? I can't.
I've scoured versions 1, 2 and 3 of the API. I'm having no luck, I can't figure out their compilcated ATOM-like feeds API, the gdata.docs.service.DocsService._DownloadFile private method says that I'm unauthorized, and I don't want to write an entire Google Login authentication system myself. I'm about to stab myself in the face due to frustration.
I have a few spreadsheets and I want to access them like so:
username = 'mygooglelogin#gmail.com'
password = getpass.getpass()
def get_spreadsheet(key, gid=0):
... (help!) ...
for row in get_spreadsheet('5a3c7f7dcee4b4f'):
cell1, cell2, cell3 = row
...
Please save my face.
Update 1: I've tried the following, but no combination of Download() or Export() seems to work. (Docs for DocsService here)
import gdata.docs.service
import getpass
import os
import tempfile
import csv
def get_csv(file_path):
return csv.reader(file(file_path).readlines())
def get_spreadsheet(key, gid=0):
gd_client = gdata.docs.service.DocsService()
gd_client.email = 'xxxxxxxxx#gmail.com'
gd_client.password = getpass.getpass()
gd_client.ssl = False
gd_client.source = "My Fancy Spreadsheet Downloader"
gd_client.ProgrammaticLogin()
file_path = tempfile.mktemp(suffix='.csv')
uri = 'http://docs.google.com/feeds/documents/private/full/%s' % key
try:
entry = gd_client.GetDocumentListEntry(uri)
# XXXX - The following dies with RequestError "Unauthorized"
gd_client.Download(entry, file_path)
return get_csv(file_path)
finally:
try:
os.remove(file_path)
except OSError:
pass
The https://github.com/burnash/gspread library is a newer, simpler way to interact with Google Spreadsheets, rather than the old answers to this that suggest the gdata library which is not only too low-level, but is also overly-complicated.
You will also need to create and download (in JSON format) a Service Account key: https://console.developers.google.com/apis/credentials/serviceaccountkey
Here's an example of how to use it:
import csv
import gspread
from oauth2client.service_account import ServiceAccountCredentials
scope = ['https://spreadsheets.google.com/feeds']
credentials = ServiceAccountCredentials.from_json_keyfile_name('credentials.json', scope)
docid = "0zjVQXjJixf-SdGpLKnJtcmQhNjVUTk1hNTRpc0x5b9c"
client = gspread.authorize(credentials)
spreadsheet = client.open_by_key(docid)
for i, worksheet in enumerate(spreadsheet.worksheets()):
filename = docid + '-worksheet' + str(i) + '.csv'
with open(filename, 'wb') as f:
writer = csv.writer(f)
writer.writerows(worksheet.get_all_values())
In case anyone comes across this looking for a quick fix, here's another (currently) working solution that doesn't rely on the gdata client library:
#!/usr/bin/python
import re, urllib, urllib2
class Spreadsheet(object):
def __init__(self, key):
super(Spreadsheet, self).__init__()
self.key = key
class Client(object):
def __init__(self, email, password):
super(Client, self).__init__()
self.email = email
self.password = password
def _get_auth_token(self, email, password, source, service):
url = "https://www.google.com/accounts/ClientLogin"
params = {
"Email": email, "Passwd": password,
"service": service,
"accountType": "HOSTED_OR_GOOGLE",
"source": source
}
req = urllib2.Request(url, urllib.urlencode(params))
return re.findall(r"Auth=(.*)", urllib2.urlopen(req).read())[0]
def get_auth_token(self):
source = type(self).__name__
return self._get_auth_token(self.email, self.password, source, service="wise")
def download(self, spreadsheet, gid=0, format="csv"):
url_format = "https://spreadsheets.google.com/feeds/download/spreadsheets/Export?key=%s&exportFormat=%s&gid=%i"
headers = {
"Authorization": "GoogleLogin auth=" + self.get_auth_token(),
"GData-Version": "3.0"
}
req = urllib2.Request(url_format % (spreadsheet.key, format, gid), headers=headers)
return urllib2.urlopen(req)
if __name__ == "__main__":
import getpass
import csv
email = "" # (your email here)
password = getpass.getpass()
spreadsheet_id = "" # (spreadsheet id here)
# Create client and spreadsheet objects
gs = Client(email, password)
ss = Spreadsheet(spreadsheet_id)
# Request a file-like object containing the spreadsheet's contents
csv_file = gs.download(ss)
# Parse as CSV and print the rows
for row in csv.reader(csv_file):
print ", ".join(row)
You might try using the AuthSub method described in the Exporting Spreadsheets section of the documentation.
Get a separate login token for the spreadsheets service and substitue that for the export. Adding this to the get_spreadsheet code worked for me:
import gdata.spreadsheet.service
def get_spreadsheet(key, gid=0):
# ...
spreadsheets_client = gdata.spreadsheet.service.SpreadsheetsService()
spreadsheets_client.email = gd_client.email
spreadsheets_client.password = gd_client.password
spreadsheets_client.source = "My Fancy Spreadsheet Downloader"
spreadsheets_client.ProgrammaticLogin()
# ...
entry = gd_client.GetDocumentListEntry(uri)
docs_auth_token = gd_client.GetClientLoginToken()
gd_client.SetClientLoginToken(spreadsheets_client.GetClientLoginToken())
gd_client.Export(entry, file_path)
gd_client.SetClientLoginToken(docs_auth_token) # reset the DocList auth token
Notice I also used Export, as Download seems to give only PDF files.
(Jul 2016) All other answers are pretty much outdated or will be, either because they use GData ("Google Data") Protocol, ClientLogin, or AuthSub, all of which have been deprecated. The same is true for all code or libraries that use the Google Sheets API v3 or older.
Modern Google API access occurs using API keys (for accessing public data), OAuth2 client IDs (for accessing data owned by users), or service accounts (for accessing data owned by applications/in the cloud) primarily with the Google Cloud client libraries for GCP APIs and Google APIs Client Libraries for non-GCP APIs. For this task, it would be the latter for Python.
To make it happen your code needs authorized access to the Google Drive API, perhaps to query for specific Sheets to download, and then to perform the actual export(s). Since this is likely a common operation, I wrote a blogpost sharing a code snippet that does this for you. If you wish to pursue this even more, I've got another pair of posts along with a video that outlines how to upload files to and download files from Google Drive.
Note that there is also a Google Sheets API v4, but it's primarily for spreadsheet-oriented operations, i.e., inserting data, reading spreadsheet rows, cell formatting, creating charts, adding pivot tables, etc., not file-based request like exporting where the Drive API is the correct one to use.
I wrote a blog post that demos exporting a Google Sheet as CSV from Drive. The core part of the script:
# setup
FILENAME = 'inventory'
SRC_MIMETYPE = 'application/vnd.google-apps.spreadsheet'
DST_MIMETYPE = 'text/csv'
DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http()))
# query for file to export
files = DRIVE.files().list(
q='name="%s" and mimeType="%s"' % (FILENAME, SRC_MIMETYPE), orderBy='modifiedTime desc,name').execute().get('files', [])
# export 1st match (if found)
if files:
fn = '%s.csv' % os.path.splitext(files[0]['name'].replace(' ', '_'))[0]
print('Exporting "%s" as "%s"... ' % (files[0]['name'], fn), end='')
data = DRIVE.files().export(fileId=files[0]['id'], mimeType=DST_MIMETYPE).execute()
if data:
with open(fn, 'wb') as f:
f.write(data)
print('DONE')
To learn more about using Google Sheets with Python, see my answer for a similar question. You can also download a Sheet in XLSX and other formats supported by Drive.
If you're completely new to Google APIs, then you need to take a further step back and review these videos first:
How to use Google APIs & create API projects -- the UI has changed but the concepts are still the same
Walkthrough of authorization boilerplate code (Python) -- you can use any supported language to access Google APIs; if you don't do Python, use it as pseudocode to help get you started
Listing your files in Google Drive and code deep dive post
If you already have experience with Google Workspace (formerly G Suite, Google Apps, Google "Docs") APIs and want to see more videos on using both APIs:
Sheets API video library
Drive API video library
Google Workspace (G Suite) Dev Show video series I produced
This no longer works as of gdata 2.0.1.4:
gd_client.SetClientLoginToken(spreadsheets_client.GetClientLoginToken())
Instead, you have to do:
gd_client.SetClientLoginToken(gdata.gauth.ClientLoginToken(spreadsheets_client.GetClientLoginToken()))
I wrote pygsheets as an alternative to gspread, but using google api v4. It has an export method to export spreadsheet.
import pygsheets
gc = pygsheets.authorize()
# Open spreadsheet and then workseet
sh = gc.open('my new ssheet')
wks = sh.sheet1
#export as csv
wks.export(pygsheets.ExportType.CSV)
The following code works in my case (Ubuntu 10.4, python 2.6.5 gdata 2.0.14)
import gdata.docs.service
import gdata.spreadsheet.service
gd_client = gdata.docs.service.DocsService()
gd_client.ClientLogin(email,password)
spreadsheets_client = gdata.spreadsheet.service.SpreadsheetsService()
spreadsheets_client.ClientLogin(email,password)
#...
file_path = file_path.strip()+".xls"
docs_token = gd_client.auth_token
gd_client.SetClientLoginToken(spreadsheets_client.GetClientLoginToken())
gd_client.Export(entry, file_path)
gd_client.auth_token = docs_token
I've simplified #Cameron's answer even further, by removing the unnecessary object orientation. This makes the code smaller and easier to understand. I also edited the url, which might work better.
#!/usr/bin/python
import re, urllib, urllib2
def get_auth_token(email, password):
url = "https://www.google.com/accounts/ClientLogin"
params = {
"Email": email, "Passwd": password,
"service": 'wise',
"accountType": "HOSTED_OR_GOOGLE",
"source": 'Client'
}
req = urllib2.Request(url, urllib.urlencode(params))
return re.findall(r"Auth=(.*)", urllib2.urlopen(req).read())[0]
def download(spreadsheet, worksheet, email, password, format="csv"):
url_format = 'https://docs.google.com/spreadsheets/d/%s/export?exportFormat=%s#gid=%s'
headers = {
"Authorization": "GoogleLogin auth=" + get_auth_token(email, password),
"GData-Version": "3.0"
}
req = urllib2.Request(url_format % (spreadsheet, format, worksheet), headers=headers)
return urllib2.urlopen(req)
if __name__ == "__main__":
import getpass
import csv
spreadsheet_id = "" # (spreadsheet id here)
worksheet_id = '' # (gid here)
email = "" # (your email here)
password = getpass.getpass()
# Request a file-like object containing the spreadsheet's contents
csv_file = download(spreadsheet_id, worksheet_id, email, password)
# Parse as CSV and print the rows
for row in csv.reader(csv_file):
print ", ".join(row)
I'm using this:
curl 'https://docs.google.com/spreadsheets/d/1-lqLuYJyHAKix-T8NR8wV8ZUUbVOJrZTysccid2-ycs/gviz/tq?tqx=out:csv' on a sheet that is set to publicly readable.
So you would need a python version of curl, if you can work with public sheets.
If you have a sheet with some tabs you don't want to reveal, create a new sheet, and import the ranges you want to publish into tabs on it.
Downloading a spreadsheet from google doc is pretty simple using sheets.
You can follow the detailed documentation on
https://pypi.org/project/gsheets/
or follow the below-given steps. I recommend reading through the documentation for better coverage.
pip install gsheets
Log in to the Google Developers Console with the Google account whose spreadsheets you want to access. Create (or select) a project and enable the Drive API and Sheets API (under Google Apps APIs).
Go to the Credentials for your project and create New credentials > OAuth client ID > of type Other. In the list of your OAuth 2.0 client IDs click Download JSON for the Client ID you just created. Save the file as client_secrets.json in your home directory (user directory).
Use the following code snippet.
from gsheets import Sheets
sheets = Sheets.from_files('client_secret.json')
print(sheets) # will ensure authenticate connection
s = sheets.get("{SPREADSHEET_URL}")
print(s) # will ensure your file is accessible
s.sheets[1].to_csv('Spam.csv', encoding='utf-8', dialect='excel') # will download the file as csv
This isn't a complete answer, but Andreas Kahler wrote up an interesting CMS solution using Google Docs + Google App Engline + Python. Not having any experience in the area, I cannot see exactly what portion of the code may be of use to you, but check it out. I know it interfaces with a Google Docs account and plays with files, so I have a feeling you'll recognize what's going on. It should at least point you in the right direction.
Google AppEngine + Google Docs + Some Python = Simple CMS
Gspread is indeed a big improvement over GoogleCL and Gdata (both of which I've used and thankfully phased out in favor of Gspread). I think that this code is even quicker than the earlier answer to get the contents of the sheet:
username = 'sdfsdfsds#gmail.com'
password = 'sdfsdfsadfsdw'
sheetname = "Sheety Sheet"
client = gspread.login(username, password)
spreadsheet = client.open(sheetname)
worksheet = spreadsheet.sheet1
contents = []
for rows in worksheet.get_all_values():
contents.append(rows)
(Mar 2019, Python 3) My data is usually not sensitive and I use usually table format similar to CSV.
In such case, one can simply publish to the web the sheet and than use it as a CSV file on a server.
(One publishes it using File -> Publish to the web ... -> Sheet 1 -> Comma separated values (.csv) -> Publish).
import csv
import io
import requests
url = "https://docs.google.com/spreadsheets/d/e/<GOOGLE_ID>/pub?gid=0&single=true&output=csv" # you can get the whole link in the 'Publish to the web' dialog
r = requests.get(url)
r.encoding = 'utf-8'
csvio = io.StringIO(r.text, newline="")
data = []
for row in csv.DictReader(csvio):
data.append(row)