I am trying to retrieve file metadata from Google drive API V3 in Python. I did it in API V2, but failed in V3.
I tried to get metadata by this line:
data = DRIVE.files().get(fileId=file['id']).execute()
but all I got was a dict of 'id', 'kind', 'name', and 'mimeType'. How can I get 'md5Checksum', 'fileSize', and so on?
I read the documentation.
I am supposed to get all the metadata by get() methods, but all I got was a small part of it.
Here is my code:
from __future__ import print_function
import os
from apiclient.discovery import build
from httplib2 import Http
from oauth2client import file, client, tools
try:
import argparse
flags = argparse.ArgumentParser(parents=[tools.argparser]).parse_args()
except ImportError:
flags = None
SCOPES = 'https://www.googleapis.com/auth/drive.metadata
https://www.googleapis.com/auth/drive'
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets('storage.json', scope=SCOPES)
creds = tools.run_flow(flow, store)
DRIVE = build('drive','v3', http=creds.authorize(Http()))
files = DRIVE.files().list().execute().get('files',[])
for file in files:
print('\n',file['name'],file['id'])
data = DRIVE.files().get(fileId=file['id']).execute()
print('\n',data)
print('Done')
I tried this answer:
Google Drive API v3 Migration
List
Files returned by service.files().list() do not contain information now, i.e. every field is null. If you want list on v3 to behave like in v2, call it like this:
service.files().list().setFields("nextPageToken, files");
but I get a Traceback:
DRIVE.files().list().setFields("nextPageToken, files")
AttributeError: 'HttpRequest' object has no attribute 'setFields'
Suppose you want to get the md5 hash of a file given its fileId, you can do it like this:
DRIVE = build('drive','v3', http=creds.authorize(Http()))
file_service = DRIVE.files()
remote_file_hash = file_service.get(fileId=fileId, fields="md5Checksum").execute()['md5Checksum']
To list some files on the Drive:
results = file_service.list(pageSize=10, fields="files(id, name)").execute()
I have built a small application gDrive-auto-sync containing more examples of API usage.
It's well-documented, so you can have a look at it if you want.
Here is the main file containing all the code. It might look like a lot but more than half of lines are just comments.
If you want to retrieve all the fields for a file resource, simply set fields='*'
In your above example, you would run
data = DRIVE.files().get(fileId=file['id'], fields='*').execute()
This should return all the available resources for the file as listed in:
https://developers.google.com/drive/v3/reference/files
There is a library PyDrive that provide easy interactions with google drive
https://googledrive.github.io/PyDrive/docs/build/html/filelist.html
Their example:
from pydrive.drive import GoogleDrive
drive = GoogleDrive(gauth) # Create GoogleDrive instance with authenticated GoogleAuth instance
# Auto-iterate through all files in the root folder.
file_list = drive.ListFile({'q': "'root' in parents and trashed=false"}).GetList()
for file1 in file_list:
print('title: %s, id: %s' % (file1['title'], file1['id']))
All you need is file1['your key']
Related
I am trying to download a Google Sheets document as a Microsoft Excel document using Python. I have been able to accomplish this task using the Python module googleapiclient.
However, the Sheets document may contain some formulas which are not compatible with Microsoft Excel (https://www.dataeverywhere.com/article/27-incompatible-formulas-between-excel-and-google-sheets/).
When I use the application I created on any Google Sheets document that used any of these formulas anywhere, I get a bogus Microsoft Excel document as output.
I would like to read the cell values in the Google Sheets document before downloading it as a Microsoft Excel document, just to prevent any such errors from happening.
The code I have written thus far is attached below:
import sys
import os
from googleapiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools
SCOPES = "https://www.googleapis.com/auth/drive.readonly"
store = file.Storage("./credentials/credentials.json")
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets("credentials/client_secret.json",
SCOPES)
creds = tools.run_flow(flow, store)
DRIVE = discovery.build("drive", "v3", http = creds.authorize(Http()))
print("Usage: tmp.py <name of the spreadsheet>")
FILENAME = sys.argv[1]
SRC_MIMETYPE = "application/vnd.google-apps.spreadsheet"
DST_MIMETYPE = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
files = DRIVE.files().list(
q = 'name="%s" and mimeType="%s"' % (FILENAME, SRC_MIMETYPE),
orderBy = "modifiedTime desc,name").execute().get("files", [])
if files:
fn = '%s.xlsx' % os.path.splitext(files[0]["name"].replace(" ", "_"))[0]
print('Exporting "%s" as "%s"... ' % (files[0]['name'], fn), end = "")
data = DRIVE.files().export(fileId=files[0]['id'], mimeType=DST_MIMETYPE).execute()
if data:
with open(fn, "wb") as f:
f.write(data)
print("Done")
else:
print("ERROR: Could not download file")
else:
print("ERROR: File not found")
If you want to use python to export something from google docs, then the simplest way is to let googles own server do the job for you.
I was doing a little webscraping on google sheets, and I made this little program which will do the job for you. You just have to insert the id of the document you want to download.
I put in a temporary id, so anyone can try it out.
import requests
ext = 'xlsx' #csv, ods, html, tsv and pdf can be used as well
key = '1yEoHh7WL1UNld-cxJh0ZsRmNwf-69uINim2dKrgzsLg'
url = f'https://docs.google.com/spreadsheets/d/{key}/export?format={ext}'
res = requests.get(url)
with open(f'file.{ext}', 'wb') as f:
f.write(res.content)
That way conversion will most certainly always be correct, because this is the same a clicking the export button inside the browser version of google sheets.
If you are planning to work with the data inside python, then I recommend using csv format instead of xlsx, and then create the necessary formulas inside python.
I think the gspread library might be what you are looking for. https://gspread.readthedocs.io/en/latest/
Here's a code sample:
import tenacity
import gspread
from oauth2client.service_account import ServiceAccountCredentials
#tenacity.retry(wait=tenacity.wait_exponential()) # If you exceed the Google API quota, this waits to retry your request
def loadGoogleSheet(spreadsheet_name):
# use creds to create a client to interact with the Google Drive API
print("Connecting to Google API...")
scope = [
'https://spreadsheets.google.com/feeds',
'https://www.googleapis.com/auth/drive'
]
creds = ServiceAccountCredentials.from_json_keyfile_name('client_secret.json', scope)
client = gspread.authorize(creds)
spreadsheet = client.open(spreadsheet_name)
return spreadsheet
def readGoogleSheet(spreadsheet):
sheet = spreadsheet.sheet1 # Might need to loop through sheets or whatever
val = sheet.cell(1, 1).value # This just gets the value of the first cell. The docs I linked to above are pretty helpful on all the other stuff you can do
return val
test_spreadsheet = loadGoogleSheet('Copy of TLO Summary - Template DO NOT EDIT')
test_output = readGoogleSheet(test_spreadsheet)
print(test_output)
Consider the following code that uses the PyDrive module:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)
file = drive.CreateFile({'title': 'test.txt'})
file.Upload()
file.SetContentString('hello')
file.Upload()
file.SetContentString('')
file.Upload() # This throws an exception.
Creating file and changing its contents works fine until I try to erase the contents by setting the content string to an empty one. Doing so throws this exception:
pydrive.files.ApiRequestError
<HttpError 400 when requesting
https://www.googleapis.com/upload/drive/v2/files/{LONG_ID}?alt=json&uploadType=resumable
returned "Bad Request">
When I look at my Drive, I see the test.txt file successfully created with text hello in it. However I expected that it would be empty.
If I change the empty string to any other text, the file is changed twice without errors. Though this doesn't clear the contents so it's not what I want.
When I looked up the error on the Internet, I found this issue on PyDrive github that may be related though it remains unsolved for almost a year.
If you want to reproduce the error, you have to create your own project that uses Google Drive API following this tutorial from the PyDrive docs.
How can one erase the contents of a file through PyDrive?
Issue and workaround:
When resumable=True is used, it seems that the data of 0 byte cannot be used. So in this case, it is required to upload the empty data without using resumable=True. But when I saw the script of PyDrive, it seems that resumable=True is used as the default. Ref So in this case, as a workaround, I would like to propose to use the requests module. The access token is retrieved from gauth of PyDrive.
When your script is modified, it becomes as follows.
Modified script:
import io
import requests
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)
file = drive.CreateFile({'title': 'test.txt'})
file.Upload()
file.SetContentString('hello')
file.Upload()
# file.SetContentString()
# file.Upload() # This throws an exception.
# I added below script.
res = requests.patch(
"https://www.googleapis.com/upload/drive/v3/files/" + file['id'] + "?uploadType=multipart",
headers={"Authorization": "Bearer " + gauth.credentials.token_response['access_token']},
files={
'data': ('metadata', '{}', 'application/json'),
'file': io.BytesIO()
}
)
print(res.text)
References:
PyDrive
Files: update
I have a Google drive repository where I used to upload lots of files. This time I would like to download something from this same repository.
The following code works to download a file with file_id:
DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http()))
file_id = '23242342345YdJqjvKLVbenO22FeKcL'
request = team_drive.DRIVE.files().get_media(fileId=file_id)
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print ("Download %d%%." % int(status.progress() * 100))
fh.seek(0)
with open('test.csv', 'wb') as f:
shutil.copyfileobj(fh, f, length=131072)
I would like to do the same but download a file from a folder this time. I tried the following code to display files in a given folder with folder_id. But it does not work.
folder_id = '13223232323237jWuf3__hKAG18jVo'
results = team_drive.DRIVE.files().list(q="mimeType='application/vnd.google-apps.spreadsheet' and parents in '"+folder_id+"'",fields="nextPageToken, files(id, name)",pageSize=400).execute()
Should the code work? I got an empty list. Any contribution would be appreciated
I believe your goal and situation as follows.
You want to download the Google Spreadsheet, which is the latest modified time, from the specific folder in your shared Drive as the XLSX format.
You want to achieve this using googleapis for python.
You have already been able to download the file using Drive API.
For this, I would like to propose the following sample script. The flow of this script is as follows.
Retrieve the latest Google Spreadsheet from the specific folder in the shared Drive.
For this, I use results = DRIVE.files().list(pageSize=1, fields="files(modifiedTime,name,id)", orderBy="modifiedTime desc", q="'" + folder_id + "' in parents and mimeType = 'application/vnd.google-apps.spreadsheet'", supportsAllDrives=True, includeItemsFromAllDrives=True).execute()
By this, the Google Spreadsheet with the latest modified time can be retrieved.
Retrieve the file ID of latest Google Spreadsheet.
In this case, results.get('files', [])[0]['id'] is the file ID.
Download the Google Spreadsheet as the XLSX format.
In this case, DRIVE.files().export_media(fileId=file_id, mimeType='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet') is used.
When above flow is used, the sample script is as follows.
Sample script:
folder_id = "###" # Please set the folder ID.
DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http()))
results = DRIVE.files().list(pageSize=1, fields="files(modifiedTime,name,id)", orderBy="modifiedTime desc", q="'" + folder_id + "' in parents and mimeType = 'application/vnd.google-apps.spreadsheet'", supportsAllDrives=True, includeItemsFromAllDrives=True).execute()
items = results.get('files', [])
if items:
file_id = items[0]['id']
file_name = items[0]['name']
request = DRIVE.files().export_media(fileId=file_id, mimeType='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')
fh = io.FileIO(file_name + '.xlsx', mode='wb')
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print('Download %d%%.' % int(status.progress() * 100))
Note:
From your script, I couldn't correctly understand about DRIVE and team_drive.DRIVE. In this case, from DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http())), I used DRIVE. If this cannot be used, please modify it.
Reference:
Files: list in Drive API v3
I use this function to get the URL's of files in a Drive folder:
from google.colab import auth
from oauth2client.client import GoogleCredentials
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
myDrive = GoogleDrive(gauth)
def getGDriveFileLinks(drive, folder_id, mime_type=None):
"""
Returns a list of dicts of pairs of file names and shareable links
drive: a GoogleDrive object with credentials https://pythonhosted.org/PyDrive/pydrive.html?highlight=googledrive#pydrive.drive.GoogleDrive
folder_id: a folderID of the folderID containing the file (grab it from the folder's URL)
mime_type (optional): the identifier of the filetype https://developers.google.com/drive/api/v3/mime-types,
https://www.iana.org/assignments/media-types/media-types.xhtml
"""
file_list = []
mime_type_query = "mimeType='{}' and ".format(mime_type) if mime_type != None else ''
files = drive.ListFile({'q': mime_type_query + "'{}' in parents".format(folder_id)}).GetList()
for file in files:
keys = file.keys()
if 'alternateLink' in keys:
link = file['alternateLink']
elif 'webContentLink' in keys:
link = file['webContentLink']
elif 'webViewLink' in keys:
link = file['webViewLink']
else:
try:
file.InsertPermission({
'type': 'anyone',
'value': 'anyone',
'role': 'reader'})
link = file['alternateLink']
except (HttpError, ApiRequestError):
link = 'Insufficient permissions for this file'
if 'title' in keys:
name = file['title']
else:
name = file['id']
file_list.append({'name': name, 'link': link})
return file_list
print(getGDriveFileLinks(myDrive, 'folder_id'))
Then, the URL can be used to retrieve the file using pydrive.
if anyone uses ruby and needs help, this command return IO.
drive.export_file(sheet_id, "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")
ref: https://googleapis.dev/ruby/google-api-client/latest/Google/Apis/DriveV3/DriveService.html#export_file-instance_method
When I upload data using following code, the data vanishes once I get disconnected.
from google.colab import files
uploaded = files.upload()
for fn in uploaded.keys():
print('User uploaded file "{name}" with length {length} bytes'.format(
name=fn, length=len(uploaded[fn])))
Please suggest me ways to upload my data so that the data remains intact even after days of disconnection.
I keep my data stored permanently in a .zip file in google drive, and upload it to the google colabs VM using the following code.
Paste it into a cell, and change the file_id. You can find the file_id from the URL of the file in google drive. (Right click on file -> Get shareable link -> find the part of the URL after open?id=)
##title uploader
file_id = "1BuM11fJJ1qdZH3VbQ-GwPlK5lAvXiNDv" ##param {type:"string"}
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# PyDrive reference:
# https://googledrive.github.io/PyDrive/docs/build/html/index.html
from google.colab import auth
auth.authenticate_user()
from googleapiclient.discovery import build
drive_service = build('drive', 'v3')
# Replace the assignment below with your file ID
# to download a different file.
#
# A file ID looks like: 1gLBqEWEBQDYbKCDigHnUXNTkzl-OslSO
import io
from googleapiclient.http import MediaIoBaseDownload
request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
# _ is a placeholder for a progress object that we ignore.
# (Our file is small, so we skip reporting progress.)
_, done = downloader.next_chunk()
fileId = drive.CreateFile({'id': file_id }) #DRIVE_FILE_ID is file id example: 1iytA1n2z4go3uVCwE_vIKouTKyIDjEq
print(fileId['title'])
fileId.GetContentFile(fileId['title']) # Save Drive file as a local file
!unzip {fileId['title']}
Keeping data in GDrive is good (#skaem).
If your data contains code, I can suggest you to simply git clone your source repository from Github (or any other code versioning service), at the beginning of your colab notebook.
This way, you can develop offline, and perform your experiments in the cloud whenever you need, with up-to-date code.
I am using Python 2.7 and I am trying to upload a file (*.txt) into a folder that is shared with me.
So far I was able to upload it to my Drive, but how to set to which folder. I get the url to where I must place this file.
Thank you
this is my code so far
def Upload(file_name, file_path, upload_url):
upload_url = upload_url
client = gdata.docs.client.DocsClient(source=upload_url)
client.api_version = "3"
client.ssl = True
client.ClientLogin(username, passwd, client.source)
filePath = file_path
newResource = gdata.docs.data.Resource(filePath,file_name)
media = gdata.data.MediaSource()
media.SetFileHandle(filePath, 'mime/type')
newDocument = client.CreateResource(
newResource,
create_uri=gdata.docs.client.RESOURCE_UPLOAD_URI,
media=media
)
the API you are using is deprecated. Use google-api-python-client instead.
Follow this official python quickstart guide to simply upload a file to a folder. Additionally, send parents parameter in request body like this: body['parents'] = [{'id': parent_id}]
Or, you can use PyDrive, a Python wrapper library which simplifies a lot of works dealing with Google Drive API. The whole code is as simple as this:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
gauth = GoogleAuth()
drive = GoogleDrive(gauth)
f = drive.CreateFile({'parent': parent_id})
f.SetContentFile('cat.png') # Read local file
f.Upload() # Upload it