I'm trying to access and modify data on Google Spreadsheet using Python. I'm having trouble to open the Google Spreadsheet from Python. I closely followed various tutorials and prepared the following before writing any code.
Enabled Google Sheets API and Google Drive API on GCP Console
Generated and downloaded credentials (JSON file) from GCP Console
Spreadsheet: Shared (edit-access) with the client email found in the JSON file
Installed gspread and oauth2client --> pip install gspread oauth2client
The following is the Python code to interface with Google Sheets. The goal in Lines 12 and 13, is to output to the console all of the data found in the linked Google Spreadsheet.
1 import gspread
2 from oauth2client.service_account import ServiceAccountCredentials
3
4 scope = ["https://spreadsheets.google.com/feeds","https://www.googleapis.com/auth/spreadsheets","https://www.googleapis.com/auth/drive.file","https://www.googleapis.com/auth/drive"]
5 creds = ServiceAccountCredentials.from_json_keyfile_name('Py-Sheets.json', scope)
6 client = gspread.authorize(creds)
7
8 print("Hello World")
9
10 sheet = client.open("Test-Sheets").sheet1
11
12 sample = sheet.get_all_records()
13 print(sample)
Everything seems to run fine up to line 10 (above), where I get an error saying SpreadsheetNotFound. Here's the error in full (below).
Traceback (most recent call last):
File "/home/username/anaconda3/lib/python3.7/site-packages/gspread/client.py", line 119, in open
self.list_spreadsheet_files(title),
File "/home/username/anaconda3/lib/python3.7/site-packages/gspread/utils.py", line 97, in finditem
return next((item for item in seq if func(item)))
StopIteration
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "pysheets.py", line 10, in <module>
sheet = client.open("Test-Sheets").sheet1
File "/home/username/anaconda3/lib/python3.7/site-packages/gspread/client.py", line 127, in open
raise SpreadsheetNotFound
gspread.exceptions.SpreadsheetNotFound
I also received the following error via email.
DNS Error: 15698833 DNS type 'mx' lookup of python-spreadsheets-123456.iam.gserviceaccount.com responded with code NXDOMAIN Domain name not found: python-spreadsheets-123456.iam.gserviceaccount.com
How do I fix the error created after executing Line 10? The code is almost the exact same as what I found in the tutorials. The spreadsheet is named exactly what I put in client.open(). Does the spreadsheet have to be in a specific GDrive directory for it to be located?
An alternative would be opening the spreadsheet by URL on Google Colab:
# Access Google Sheets as a data source.
from google.colab import auth
auth.authenticate_user()
import gspread
from oauth2client.client import GoogleCredentials
gc = gspread.authorize(GoogleCredentials.get_application_default())
# At this point, you will have a link printed in your output which will direct you to a sign-in page. Pick the relevant google account and copy the provided link. Paste it in the provided line at the output section.
# Load your dataframe
import pandas as pd
wb = gc.open_by_url('https://docs.google.com/spreadsheets/.....') # A URL of your workbook.
sheet1 = wb.worksheet('Sheet1') # Enter your sheet name.
original_df = sheet1.get_all_values()
df = pd.DataFrame(original_df)
df.head()
Related
I am trying to extract text data from images using Google Cloud Vision API. My Initial starting point was here After Enabling Vision API and Creating service account,generating json file, I created a script by referring this example.
Here's my code
from google.cloud import vision
from google.cloud.vision_v1 import types
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'generated_after_creating_sec_key.json'
image_path = 'images\\image_1.png'
vision_client = vision.ImageAnnotatorClient()
print(vision_client)
image = types.Image()
image.source.image_path = image_path
response = vision_client.text_detection(image=image)
for text in response.text_annotations:
print(text.description)
The only difference between the example shown in google page and my code is that the image uploaded in the example is on gcloud while mine happens to be on local storage.
Here's the complete stacktrace.
<google.cloud.vision_v1.ImageAnnotatorClient object at 0x000001DF861D7970>
Traceback (most recent call last):
File "text_detection.py", line 10, in <module>
image.source.image_path = image_path
File "C:\Users\user\ProjectFolder\ProjName\venv\lib\site-packages\proto\message.py", line 677, in __setattr__
pb_type = self._meta.fields[key].pb_type
KeyError: 'image_path'
What is the root cause of this error? Please help!
Per the Google Cloud Vision docs, you want image.source.image_uri instead.
According to the Google Cloud Vision docs, you should use
image.source.image_uri instead.
I'm trying to move a Python Jupyter scraper script (and json cred file) from my laptop to Google Colab.
I've made a connection between Google Colab and Google Drive.
I've stored the (.ipynb) script and credential JSON file on Google Drive.
However I can't make the connection between the 2 (gdrive json cred file and colab) to make it work.
Here below the part of the script concerning the credentials handling:
# Sheet key
# 1i1bmMt-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx_d7Eo
import gspread
import pandas as pd
import requests
from bs4 import BeautifulSoup
from oauth2client.service_account import ServiceAccountCredentials
# Access credentials for google sheet and access the google sheet
scope = ["https://spreadsheets.google.com/feeds",
"https://www.googleapis.com/auth/spreadsheets",
"https://www.googleapis.com/auth/drive.file",
"https://www.googleapis.com/auth/drive"]
# Copy your path to your credential JSON file.
PATH_TO_CREDENTIAL = '/Users/user/json-keys/client_secret.json'
# Initiate your credential
credentials = ServiceAccountCredentials.from_json_keyfile_name(PATH_TO_CREDENTIAL, scope)
# Authorize your connection to your google sheet
gc = gspread.authorize(credentials)
I receive FileNotFoundError: and credential erros
Hope someone can help me with this, thanks
You try to put the file to the same directory to test it first. Make sure that the file is okay and can run successfully.
Here's the source code for reference:
If client_secret.json is in the same directory as the file you're running, then the correct syntax is:
import os
DIRNAME = os.path.dirname(__file__)
credentials = ServiceAccountCredentials.from_json_keyfile_name(os.path.join(DIRNAME, 'client_secret.json'), scope)
If the above test is okay, then try to move the file to your target directory '/Users/user/json-keys/client_secret.json' and try to create a symbolic link in the current directory to link the client_secret.json file. Then, run the program with the above code to test it again. Make sure it has no problem when putting the file to that directory. It's a workaround.
I used this case for reference to this:
Django not recognizing or seeing JSON file
Set-up
I have a local folder – access_google_drive – which contains the .py script used to access my Google Drive account via the Google Drive API.
The script looks like this,
def connect_google_drive_api():
import os
# use Gdrive API to access Google Drive
os.chdir('/Users/my/fol/ders/access_google_drive')
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
gauth = GoogleAuth()
gauth.LocalWebserverAuth() # client_secrets.json need to be in the same directory as the script
drive = GoogleDrive(gauth)
return drive
In the access_google_drive I also have the client_secrets.json file.
Issue
This set-up worked fine, until yesterday. Since yesterday, I see the following error,
Failed to find "code" in the query parameters of the redirect.
Try command-line authentication
Traceback (most recent call last):
File "<ipython-input-8-48c5ad9148cf>", line 2, in <module>
gauth.LocalWebserverAuth() # client_secrets.json need to be in the same directory as the script
File "/opt/anaconda3/lib/python3.7/site-packages/pydrive/auth.py", line 115, in _decorated
code = decoratee(self, *args, **kwargs)
File "/opt/anaconda3/lib/python3.7/site-packages/pydrive/auth.py", line 241, in LocalWebserverAuth
raise AuthenticationError('No code found in redirect')
AuthenticationError: No code found in redirect
Question
I have no idea why I'm seeing this error. Both script and file are in the same folder. I haven't edited the script nor the .json.
Did I miss an update? Are the stars not aligned?
Who can help me out?
I have an AI Platform VM instance set up with a Python3 notebook. I also have a Google Cloud Storage bucket that contains numerous .CSV and .SAV files. I have no difficulties using standard python packages likes Pandas to read in data from the CSV files, but my notebook appears unable to locate my .SAV files in my storage bucket.
Does anyone know what is going on here and/or how I can resolve this issue?
import numpy as np
import pandas as pd
import pyreadstat
df = pd.read_spss("gs://<STORAGE_BUCKET>/datafile.sav")
---------------------------------------------------------------------------
PyreadstatError Traceback (most recent call last)
<ipython-input-10-30836249273f> in <module>
----> 1 df = pd.read_spss("gs://<STORAGE_BUCKET>/datafile.sav")
/opt/conda/lib/python3.7/site-packages/pandas/io/spss.py in read_spss(path, usecols, convert_categoricals)
41
42 df, _ = pyreadstat.read_sav(
---> 43 path, usecols=usecols, apply_value_formats=convert_categoricals
44 )
45 return df
pyreadstat/pyreadstat.pyx in pyreadstat.pyreadstat.read_sav()
pyreadstat/_readstat_parser.pyx in pyreadstat._readstat_parser.run_conversion()
PyreadstatError: File gs://<STORAGE_BUCKET>/datafile.sav does not exist!
The read_spss function can only read from a local file path:
path: pathstr or Path - File path.
Compare that with the read_csv function:
filepath_or_bufferstr: str, path object or file-like object -
Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected.
I have some data files uploaded on my google drive.
I want to import those files into google colab.
The REST API method and PyDrive method show how to create a new file and upload it on drive and colab. Using that, I am unable to figure out how to read the data files already present on my drive in my python code.
I am a total newbie to this. Can someone help me out?
(Update April 15 2018: The gspread is frequently being updated, so to ensure stable workflow I specify the version)
For spreadsheet file, the basic idea is using packages gspread and pandas to read spreadsheets in Drive and convert them to pandas dataframe format.
In the Colab notebook:
#install packages
!pip install gspread==2.1.1
!pip install gspread-dataframe==2.1.0
!pip install pandas==0.22.0
#import packages and authorize connection to Google account:
import pandas as pd
import gspread
from gspread_dataframe import get_as_dataframe, set_with_dataframe
from google.colab import auth
auth.authenticate_user() # verify your account to read files which you have access to. Make sure you have permission to read the file!
from oauth2client.client import GoogleCredentials
gc = gspread.authorize(GoogleCredentials.get_application_default())
Then I know 3 ways to read Google spreadsheets.
By file name:
spreadsheet = gc.open("goal.csv") # Open file using its name. Use this if the file is already anywhere in your drive
sheet = spreadsheet.get_worksheet(0) # 0 means the first sheet in the file
df2 = pd.DataFrame(sheet.get_all_records())
df2.head()
By url:
spreadsheet = gc.open_by_url('https://docs.google.com/spreadsheets/d/1LCCzsUTqBEq5pemRNA9EGy62aaeIgye4XxwReYg1Pe4/edit#gid=509368585') # use this when you have the complete url (the edit#gid means permission)
sheet = spreadsheet.get_worksheet(0) # 0 means the first sheet in the file
df2 = pd.DataFrame(sheet.get_all_records())
df2.head()
By file key/ID:
spreadsheet = gc.open_by_key('1vpukIbGZfK1IhCLFalBI3JT3aobySanJysv0k5A4oMg') # use this when you have the key (the string in the url following spreadsheet/d/)
sheet = spreadsheet.get_worksheet(0) # 0 means the first sheet in the file
df2 = pd.DataFrame(sheet.get_all_records())
df2.head()
I shared the code above in a Colab notebook:
https://drive.google.com/file/d/1cvur-jpIpoEN3vAO8Fd_yVAT5Qgbr4GV/view?usp=sharing
Source: https://github.com/burnash/gspread
!) Set your data to be publicly available then
for public spreadsheets:
from StringIO import StringIO # got moved to io in python3.
import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc?
key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content
In [10]: df = pd.read_csv(StringIO(data), index_col=0,parse_dates=
['Quradate'])
In [11]: df.head()
More here: Getting Google Spreadsheet CSV into A Pandas Dataframe
If private data sort of the same but you will have to do some auth gymnastics...
From Google Colab snippets
from google.colab import auth
auth.authenticate_user()
import gspread
from oauth2client.client import GoogleCredentials
gc = gspread.authorize(GoogleCredentials.get_application_default())
worksheet = gc.open('Your spreadsheet name').sheet1
# get_all_values gives a list of rows.
rows = worksheet.get_all_values()
print(rows)
# Convert to a DataFrame and render.
import pandas as pd
pd.DataFrame.from_records(rows)