Get CSV from google drive and then load to pandas - python

My Goal is to read a .csv file from google drive and load it to a dataframe.
I tried some answers here but the thing is, the file is not public and needs authentication.
I looked up on goggle drive API but I was stuck there and I don't know how to move forward. I did manage to open google sheet and load it to a dataframe but that is different, this is a sample for google sheet that works.
service = build('sheets', 'v4', credentials=creds)
sheet = service.spreadsheets()
sheets_file = sheet.values().get(
spreadsheetId=sheet_id,
range=sheet_range
).execute()
header = sheets_file.get('values', [])[0] # Assumes first line is header!
values = sheets_file.get('values', [])[1:] # Everything else is data.
if not values:
print('No data found.')
else:
all_data = []
for col_id, col_name in enumerate(header):
column_data = []
for row in values:
column_data.append(row[col_id])
ds = pd.Series(data=column_data, name=col_name)
all_data.append(ds)
df = pd.concat(all_data, axis=1)
print(df.head())
I saw some google colab methods too but I cant use that as I am restricted to using python only, any Idea on how to approach this?

I believe your goal and situation as follows.
You want to download the CSV data from the CSV file on Google Drive.
You can get values from Google Spreadsheet using googleapis for python.
Pattern 1:
In this pattern, the CSV data is downloaded with googleapis. The downloaded CSV data is saved as a file. And the value is retrieved by the method of "Files: get" in Drive API v3.
Sample script:
file_id = "###" # Please set the file ID of the CSV file.
service = build('drive', 'v3', credentials=creds)
request = service.files().get_media(fileId=file_id)
fh = io.FileIO("sample.csv", mode='wb')
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print("Download %d%%." % int(status.progress() * 100))
In this case, the CSV data can be converted to the dataframe with df = pd.read_csv("sample.csv").
Pattern 2:
In this pattern, as a simple method, the access token is used from creds. The downloaded CSV data is not saved as a file. And the value is retrieved by the method of "Files: get" in Drive API v3.
Sample script:
file_id = "###" # Please set the file ID of the CSV file.
access_token = creds.token
url = "https://www.googleapis.com/drive/v3/files/" + file_id + "?alt=media"
res = requests.get(url, headers={"Authorization": "Bearer " + access_token})
print(res.text)
In this case, the CSV data can be directly converted to the dataframe with df = pd.read_csv(io.StringIO(res.text)).
Note:
In the following scripts, please include the scope of https://www.googleapis.com/auth/drive.readonly and/or https://www.googleapis.com/auth/drive. When you modified the scopes, please reauthorize the scopes. By this, the modified scopes are included in the access token. Please be careful this.
Reference:
Download files

Related

EXPORT AS XLSX format python google sheet api

may I know what I need to do to export the file in google sheet as xlsx format?
My code below is working but I need to save the file also into xlsx format...... :(
Here's my code:
from oauth2client.service_account import ServiceAccountCredentials
import gsheets
pdkey = "keypd.json"
url = f"https://docs.google.com/spreadsheets/d/1MCkqb_123123123123asdasdada/edit#gid=0"
SCOPE = ["https://spreadsheets.google.com/feeds", 'https://www.googleapis.com/auth/spreadsheets',
"https://www.googleapis.com/auth/drive.file", "https://www.googleapis.com/auth/drive"]
CREDS = ServiceAccountCredentials.from_json_keyfile_name(pdkey, SCOPE)
sheets = gsheets.Sheets(CREDS)
sheet = sheets.get(url)
sheet[0].to_csv("/root/xlsx/SAMPLE.csv")
In your situation, how about exporting the Spreadsheet with the export method of Drive API? When this is reflected in your script it becomes as follows.
Modified script:
From:
sheets = gsheets.Sheets(CREDS)
sheet = sheets.get(url)
sheet[0].to_csv("/root/xlsx/SAMPLE.csv")
To:
access_token = CREDS.create_delegated(CREDS._service_account_email).get_access_token().access_token
url = "https://www.googleapis.com/drive/v3/files/" + spreadsheet_id + "/export?mimeType=application%2Fvnd.openxmlformats-officedocument.spreadsheetml.sheet"
res = requests.get(url, headers={"Authorization": "Bearer " + access_token})
# If you want to create the XLSX data as a file, you can use the following script.
with open("sample.xlsx", 'wb') as f:
f.write(res.content)
In this script, please add import requests.
In this modified script, the Spreadsheet is exported as XLSX data using the method of export in Drive API. The access token is retrieved from the service account.
Reference:
Files: export

How to read a link from a cell in Google Spreadsheet if it's inside href tag (gspread)

I am new to stackoverflow, so I sorry in advance if I do something wrong
I have a spreadsheet on Google sheets, for example, this one
And there is a link in the cell inside the href tag. I want to get the link and the text of the cell using Google Sheets API or gspread.
I have already tried this solution but I get access token 'None'.
I have tried to web scrape with beautifulsoup, but it didn't work as well.
As for bs4 solution, I tried using this code, that I found here
from bs4 import BeautifulSoup
import requests
html = requests.get('https://docs.google.com/spreadsheets/d/1v8vM7yQ-27SFemt8_3IRiZr-ZauE29edin-azKpigws/edit#gid=0').text
soup = BeautifulSoup(html, "lxml")
tables = soup.find_all("table")
content = []
for table in tables:
content.append([[td.text for td in row.find_all("td")] for row in table.find_all("tr")])
print(content)
I figured it out. Here's the full code if anyone needs it
import requests
import gspread
import urllib.parse
import pickle
spreadsheetId = "###" # Please set the Spreadsheet ID.
cellRange = "Yoursheetname!A1:A100" # Please set the range with A1Notation. In this case, the hyperlink of the cell "A1" of "Sheet1" is retrieved.
with open('token_sheets_v4.pickle', 'rb') as token:
# get this file here
# https://developers.google.com/identity/sign-in/web/sign-in
credentials = pickle.load(token)
client = gspread.authorize(credentials)
# 1. Retrieve the access token.
access_token = client.auth.token
# 2. Request to the method of spreadsheets.get in Sheets API using `requests` module.
fields = "sheets(data(rowData(values(hyperlink))))"
url = "https://sheets.googleapis.com/v4/spreadsheets/" + spreadsheetId + "?ranges=" + urllib.parse.quote(cellRange) + "&fields=" + urllib.parse.quote(fields)
res = requests.get(url, headers={"Authorization": "Bearer " + access_token})
print(res)
# 3. Retrieve the hyperlink.
obj = res.json()
print(obj)
link = obj["sheets"][0]['data'][0]['rowData'][0]['values'][0]['hyperlink']
print(link)
UPDATE!!
More elegant solution is this. Creating service:
CLIENT_SECRET_FILE = 'secret/secret.json'
API_SERVICE_NAME = 'sheets'
API_VERSION = 'v4'
SCOPES = ['https://www.googleapis.com/auth/spreadsheets.readonly']
def Create_Service():
cred = None
pickle_file = f'secret/token_{API_SERVICE_NAME}_{API_VERSION}.pickle'
if os.path.exists(pickle_file):
with open(pickle_file, 'rb') as token:
cred = pickle.load(token)
if not cred or not cred.valid:
if cred and cred.expired and cred.refresh_token:
cred.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(CLIENT_SECRET_FILE, SCOPES)
cred = flow.run_local_server()
with open(pickle_file, 'wb') as token:
pickle.dump(cred, token)
try:
service = build(API_SERVICE_NAME, API_VERSION, credentials=cred)
print(API_SERVICE_NAME, 'service created successfully')
return service
except Exception as e:
print('Unable to connect.')
print(e)
return None
service = Create_Service()
And extracting links from each sheet in a spreadsheet in a form of convenient dictionaries
fields = "sheets(properties(title),data(startColumn,rowData(values(hyperlink))))"
print(service.spreadsheets().get(spreadsheetId=self.__spreadsheet_id,
fields=fields).execute())
So, how fields work. We go to Spreadsheet object description and looking for JSON representation. If we want to return, for example sheet object from that json representation, we just use this fields = "sheets", because Spreadsheet has field "sheets" it its json representation.
Ok, cool. We got sheets object. How to access sheet object fields? Just click on that thing and look for its fields.
So, how to combine fields? It's easy. For example, I want to return field "properties" and "data" from sheets object, I write the fields string that way: fields = "sheets(properties,data)". So we just list them as arguments in an ordinary function but without space.
The same applies for objects that return data fields and ect.

Where does Google's Drive API store downloaded files?

I'm attempting to download a file from Google Drive using Python, and I'm not sure where the file is being stored.
Following the example here: https://developers.google.com/drive/api/v3/manage-downloads#python
Code:
def DownloadGoogleFile(id: int):
file = str(id) + '.txt'
creds = GetGoogleCredentials()
service = build('drive', 'v3', credentials=creds)
# Call the Drive v3 API
FileSearch = service.files().list(q="name='{0}'".format(file), fields="nextPageToken, files(id, name)").execute()
FoundFiles = FileSearch.get('files', [])
if FoundFiles:
FileID = FoundFiles[0]['id']
request = service.files().get_media(fileId=FileID)
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print ("Download %d%%." % int(status.progress() * 100))
else:
output = 'No file found'
I'm getting output of Download 100% but that's it. I can't find the file anywhere. I was thinking it'd be in the same directory as the python file, but there isn't anything there. I also though it may need to be fh=io.FileIO(file) as a way to specify where I want to save the file, but I'm getting a 'no file exists' error when doing that so I'm not sure.
Following the example from the docs, you should be able to just replace
fh = io.BytesIO()
With
fh = io.FileIO('filename.extension', mode='wb')
io.BytesIO() is an in memory file-like object and is not written to disk

google api (sheets) Request had insufficient authentication scopes

i want to read an write data from a sheet, reading works fine but writing doesn't. i use all the scopes mentioned in the documentation: https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets.values/append
data_writer(1,1,1)
code:
from __future__ import print_function
from apiclient.discovery import build
from httplib2 import Http
from oauth2client import file, client, tools
# Setup the Sheets API
SCOPES = 'https://www.googleapis.com/auth/spreadsheets'+"https://www.googleapis.com/auth/drive.file"+"https://www.googleapis.com/auth/drive"
store = file.Storage('credentials.json')
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
creds = tools.run_flow(flow, store)
service = apiclient.discovery.build('sheets', 'v4', http=creds.authorize(Http()))
# Call the Sheets API
SPREADSHEET_ID = '1JwVOqtUCWBMm_O6esIb-9J4TgqAmMIdYm9sf5y-A7EM'
RANGE_NAME = 'Jokes!A:C'
# How the input data should be interpreted.
value_input_option = 'USER_ENTERED'
# How the input data should be inserted.
insert_data_option = 'INSERT_ROWS'
def data_reader():
#reading data
read = service.spreadsheets().values().get(spreadsheetId=SPREADSHEET_ID,range=RANGE_NAME).execute()
#reading values
values = read.get('values', [])
if not values:
print('No data found.')
else:
for row in values:
print(row[2])
continue
def data_writer(score,num_comments,mystring):
value_range_body = {
"score":score,
"num_comments":num_comments,
"joke":mystring
}
request = service.spreadsheets().values().append(spreadsheetId=SPREADSHEET_ID, range=RANGE_NAME, valueInputOption=value_input_option, insertDataOption=insert_data_option, body=value_range_body)
response = request.execute()
SCOPES must be of type list
SCOPES = ['https://www.googleapis.com/auth/spreadsheets', "https://www.googleapis.com/auth/drive.file", "https://www.googleapis.com/auth/drive"]
Side note: you have
.../auth/drive
And
.../auth/drive.file
/drive.file limits the API to work with only drive.file API but /drive opens up to all of the drive API. So you should pick one that fits your needs.
Side note 2:
Based on the link you’ve provided, it mentions you need at least one of the API’s to work with spreadsheets, so you may not need all of them either.
First off you should only need https://www.googleapis.com/auth/drive as it gives full access to a users drive account including reading and writing sheets.
list of Sheet scopes
list of drive scopes
If you have already run your code once and authenticated your user then changed the scopes in your code. Remember that you will need to run your code again and re-authenticate the user to gain the access granted by the new scopes.
Side note:
SCOPES = 'https://www.googleapis.com/auth/spreadsheets'+"https://www.googleapis.com/auth/drive.file"+"https://www.googleapis.com/auth/drive"
Is just going to be one long string you need to separate them with a space or as the other answer states use an array.
SCOPES = 'https://www.googleapis.com/auth/spreadsheets ' + "https://www.googleapis.com/auth/drive.file " + "https://www.googleapis.com/auth/drive"

How to download a file with python-google-api

How would I download a file using the GoogleAPI? Here is what I have so far:
CLIENT_ID = '255556'
CLIENT_SECRET = 'y8sR1'
DOCUMENT_ID = 'a123'
service=build('drive', 'v2')
# How to do the following line?
service.get_file(CLIENT_ID, CLIENT_SECRET, DOCUMENT_ID)
There are different ways to download a file using Google Drive API. It depends on whether you are downloading a normal file or a google document (that needs to be exporteed in a specific format).
for regular files stored in drive, you can either use:
alt=media and it's the preferred option, as in:
GET https://www.googleapis.com/drive/v2/files/0B9jNhSvVjoIVM3dKcGRKRmVIOVU?alt=media
Authorization: Bearer ya29.AHESVbXTUv5mHMo3RYfmS1YJonjzzdTOFZwvyOAUVhrs
the other method is to use DownloadUrl, as in:
from apiclient import errors
# ...
def download_file(service, drive_file):
"""Download a file's content.
Args:
service: Drive API service instance.
drive_file: Drive File instance.
Returns:
File's content if successful, None otherwise.
"""
download_url = drive_file.get('downloadUrl')
if download_url:
resp, content = service._http.request(download_url)
if resp.status == 200:
print 'Status: %s' % resp
return content
else:
print 'An error occurred: %s' % resp
return None
else:
# The file doesn't have any content stored on Drive.
return None
For google documents, instead of using downloadUrl, you need to use exportLinks and specify the mime type, for example:
download_url = file['exportLinks']['application/pdf']
The rest of the documentation can be found here:
https://developers.google.com/drive/web/manage-downloads

Categories

Resources