How to loop over nextPageToken using GoogleDrive's Python Quickstart

How to loop over nextPageToken using GoogleDrive's Python Quickstart - python

My goal is to have a list of all of the items & folders in everyone's Google Drive. I'm starting with trying to make sure the script works on my own. I have read cover-to-cover the Drive REST API documentation, and eventually found this code, which can also be found here.
from __future__ import print_function
import httplib2
import os
import sys
from apiclient import discovery
from oauth2client import client
from oauth2client import tools
from oauth2client.file import Storage
reload(sys)
sys.setdefaultencoding('utf-8')
try:
import argparse
flags = argparse.ArgumentParser(parents=[tools.argparser]).parse_args()
except ImportError:
flags = None
# If modifying these scopes, delete your previously saved credentials
# at ~/.credentials/drive-python-quickstart.json
SCOPES = 'https://www.googleapis.com/auth/drive.metadata.readonly'
CLIENT_SECRET_FILE = 'client_secret.json'
APPLICATION_NAME = 'Drive API Python Quickstart'
def get_credentials():
"""Gets valid user credentials from storage.
If nothing has been stored, or if the stored credentials are invalid,
the OAuth2 flow is completed to obtain the new credentials.
Returns:
Credentials, the obtained credential.
"""
home_dir = os.path.expanduser('~')
credential_dir = os.path.join(home_dir, '.credentials')
if not os.path.exists(credential_dir):
os.makedirs(credential_dir)
credential_path = os.path.join(credential_dir,
'drive-python-quickstart.json')
store = Storage(credential_path)
credentials = store.get()
if not credentials or credentials.invalid:
flow = client.flow_from_clientsecrets(CLIENT_SECRET_FILE, SCOPES)
flow.user_agent = APPLICATION_NAME
if flags:
credentials = tools.run_flow(flow, store, flags)
else: # Needed only for compatibility with Python 2.6
credentials = tools.run(flow, store)
print('Storing credentials to ' + credential_path)
return credentials
def main():
"""Shows basic usage of the Google Drive API.
Creates a Google Drive API service object and outputs the names and IDs
for up to 10 files.
"""
credentials = get_credentials()
http = credentials.authorize(httplib2.Http())
service = discovery.build('drive', 'v3', http=http)
results = service.files().list(
pageSize=1000,fields="nextPageToken, files(mimeType, name)").execute()
items = results.get('files', [])
if not items:
print('No files found.')
else:
print('Files:')
for item in items:
print('{0} ({1})'.format(item['name'], item['mimeType']))
if __name__ == '__main__':
main()
My problem is with the nextPageToken, and how to properly use it. The max PageSize is 1000, so I must loop over the nextPageToken, fetch it from the resulting JSON, put it back into the original loop (line 66?), to get another 1000 results. How do I do this?

Let's look the google drive api documentation for the File:list Method
In the fields of your request you are asking the nextPageToken, the result will contain the token for the nextPage (if the nextPage exists).
The result will be something like this :
{
...,
"nextPageToken": "V1*3|0|XXXXXX",
"files": [
{
...
},...
]
}
you can extract nextPageToken value like :
token = results.get('nextPageToken', None)
The List method can take the string parameter pageToken :
The token for continuing a previous list request on the next page.
This should be set to the value of 'nextPageToken' from the previous
response.
Just set the parameter pageToken in the next request to get the next page of results :
results = service.files().list(
pageSize=1000,
pageToken=token,
fields="nextPageToken, files(mimeType, name)").execute()
items = results.get('files', [])
Now you can easily make a loop to get all result.

I will try to demonstrate the concept for you but you'll do the implementation in Python. The short answer is, nextPageToken. nextPageTokens enable you to retrieve the results from the next page.
When you perform a GET request, a nextPageToken will always be included in the response so if you had 1000 results but you only wanted to display 20 per page, you can fetch the remaining 980 files using nextPageToken.
Run this URL and you'll see something like:
"kind": "drive#fileList",
"nextPageToken": "V1*3|0|CjkxOHY2aDdROE9JYkJGWUJEaU5Ybm1OVURSemJTcWFMa2lRQlVJSnVxYmI2YkYzMmhnVHozeWkwRnASBxCqqcG4kis",
"incompleteSearch": false,
The value of the nextPageToken here is what you use to get to the next page. When you get to the next page and you have more results, a new nextPageToken will be generated for you until you view/get all the results (980-1000).

You must looping while token (for the nextpage) is not null, like this code at the end:
(do not forget to install -->
pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib
Copy and paste this code (Do not forget to change your paths and your googleDrive folder ID, at the end)
from __future__ import print_function
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from googleapiclient.http import MediaFileUpload, MediaIoBaseDownload
# If modifying these scopes, delete the file token.pickle.
SCOPES = [
'https://www.googleapis.com/auth/spreadsheets',
"https://www.googleapis.com/auth/drive.file",
"https://www.googleapis.com/auth/drive"
]
# FOR AUTHENTICATION
def authenticate():
creds = None
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
creds = pickle.load(token)
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'YOUR PATH FOR THE CREDENTIALS JSON/credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
with open('YOUR PATH /token.pickle', 'wb') as token:
pickle.dump(creds, token)
service = build('drive', 'v3', credentials=creds)
return service
# LISTS TO TAKE ALL FILES AND IDs FROM SPECIFIC FOLDER
listFilesDrive=[]
line = []
# TO TAKE ALL FILES FROM SPECIFIC FOLDER
def listFilesFromGoogleFolder(IDFolder):
service = authenticate()
# Call the Drive v3 API
results = service.files().list( q="'{}' in parents".format(FolderIDFromGDrive),
fields="nextPageToken, files(id, name)").execute()
items = results.get('files', [])
# TAKE TOKEN FROM THE NEXT PAGE (IF THERE IS NO ONE, THIS VALUE WILL BE NULL)
token = results.get('nextPageToken', None)
#print('token->>',nextPageToken)
if not items:
print('No files found.')
else:
print('Files:')
line = []
for item in items:
# TAKE FIRST PAGE IN A LIST ->> "listFilesDrive"
arquivo = item['name']
IDarquivo = item['id']
line.append(arquivo)
line.append(IDarquivo)
listFilesDrive.append(line)
line=[]
print(u'{0} ({1})'.format(item['name'], item['id']))
# LOOPING WHILE TOKEN FOR OTHER PAGES IS NOT NULL TOKEN
while token!=None:
service = authenticate()
results = service.files().list( q="'{}' in parents".format(IDFolder),
pageToken=token,
fields="nextPageToken, files(id, name)").execute()
items = results.get('files', [])
# TAKE A NEW TOKEN FOR THE NEXT PAGE, IF THERE IS NO, THIS TOKEN WILL BE NULL ("None")
token = results.get('nextPageToken', None)
if not items:
print('No files found.')
else:
print('Files:')
linha = []
for item in items:
arquivo = item['name']
IDarquivo = item['id']
line.append(arquivo)
line.append(IDarquivo)
listFilesDrive.append(line)
line=[]
print(u'{0} ({1})'.format(item['name'], item['id']))
print(len(listFilesDrive))
print(listFilesDrive)
# put your specific information
if __name__ == '__main__':
FolderIDFromGDrive='YOUR FOLDER ID'
listFilesFromGoogleFolder(FolderIDFromGDrive)

I had quite a bit of trouble with this. I didn't read the example closely enough to notice that nextPageToken & newStartPageToken were not the same thing.
I split up the functions a little and added a loop. Basically, return the startPageToken and loop over the same function / call the function as required.
from __future__ import print_function
import httplib2
import os
#julian
import time
from apiclient import discovery
from oauth2client import client
from oauth2client import tools
from oauth2client.file import Storage
try:
import argparse
flags = argparse.ArgumentParser(parents=[tools.argparser]).parse_args()
except ImportError:
flags = None
# If modifying these scopes, delete your previously saved credentials
# at ~/.credentials/drive-python-quickstart.json
SCOPES = 'https://www.googleapis.com/auth/drive.metadata.readonly'
CLIENT_SECRET_FILE = 'client_secret.json'
APPLICATION_NAME = 'Drive API Python Quickstart'
def get_credentials():
"""Gets valid user credentials from storage.
If nothing has been stored, or if the stored credentials are invalid,
the OAuth2 flow is completed to obtain the new credentials.
Returns:
Credentials, the obtained credential.
"""
home_dir = os.path.expanduser('~')
credential_dir = os.path.join(home_dir, '.credentials')
if not os.path.exists(credential_dir):
os.makedirs(credential_dir)
credential_path = os.path.join(credential_dir,'drive-python-quickstart.json')
store = Storage(credential_path)
credentials = store.get()
if not credentials or credentials.invalid:
flow = client.flow_from_clientsecrets(CLIENT_SECRET_FILE, SCOPES)
flow.user_agent = APPLICATION_NAME
if flags:
credentials = tools.run_flow(flow, store, flags)
else: # Needed only for compatibility with Python 2.6
credentials = tools.run(flow, store)
print('Storing credentials to ' + credential_path)
return credentials
def main():
"""Shows basic usage of the Google Drive API.
Creates a Google Drive API service object and outputs the names and IDs
for up to 10 files.
"""
credentials = get_credentials()
http = credentials.authorize(httplib2.Http())
service = discovery.build('drive', 'v3', http=http)
saved_start_page_token = StartPage_v3(service)
saved_start_page_token = DetectChanges_v3(service, saved_start_page_token)
starttime=time.time()
while True:
saved_start_page_token = DetectChanges_v3(service, saved_start_page_token)
time.sleep(10.0 - ((time.time() - starttime) % 10.0))
def StartPage_v3(service):
response = service.changes().getStartPageToken().execute()
print('Start token: %s' % response.get('startPageToken'))
return response.get('startPageToken')
def DetectChanges_v3(service, saved_start_page_token):
# Begin with our last saved start token for this user or the
# current token from getStartPageToken()
page_token = saved_start_page_token;
while page_token is not None:
response = service.changes().list(pageToken=page_token, spaces='drive').execute()
for change in response.get('changes'):
# Process change
mimeType = change.get('file').get('mimeType')
print( 'Change found for: %s' % change)
if 'newStartPageToken' in response:
# Last page, save this token for the next polling interval
saved_start_page_token = response.get('newStartPageToken')
page_token = response.get('nextPageToken')
return saved_start_page_token
if __name__ == '__main__':
main()

Related

How can I get share link of file/folder by path to the file on insalled google drive in python?

I installed Google Drive on my computer (Windows 11 x64) to drive G:\
I want to be able to get a shared link for a specific file/folder that I have a path to.
Google Drive will have duplicate file/folder.
How can i do this whith python?
Thanks in advance
Edited:
I managed to get a link for specific file name, but now I have a problem if there are 2+ files with the same file name in Google Drive.
For example I want link of this file G:\RootFolder\Subfolder1\Subfolder2\myfile.txt but there is another file with same name G:\RootFolder\Subfolder3\Subfolder4\Subfolder5\myfile.txt. How can I give link only for G:\RootFolder\Subfolder1\Subfolder2\myfile.txt ?

from Google import Create_Service
CLIENT_SECRET_FILE = 'client-secret.json'
API_NAME = 'drive'
API_VERSION = 'v3'
SCOPES = ['https://www.googleapis.com/auth/drive']
service = Create_Service(CLIENT_SECRET_FILE, API_NAME, API_VERSION, SCOPES)
# Update Sharing Setting
file_id = '<file id>'
request_body = {
'role': 'reader',
'type': 'anyone'
}
response_permission = service.permissions().create(
fileId=file_id,
body=request_body
).execute()
print(response_permission)
# Print Sharing URL
response_share_link = service.files().get(
fileId=file_id,
fields='webViewLink'
).execute()
print(response_share_link)
# Remove Sharing Permission
service.permissions().delete(
fileId=file_id,
permissionId='anyoneWithLink'
).execute()

I managed to create script that works for me.
Packages: pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib
from __future__ import print_function
import argparse
import os.path
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
# According the guide https://developers.google.com/drive/api/quickstart/python
# TODO: First time - Create "credentials.json" file https://developers.google.com/workspace/guides/create-credentials#oauth-client-id
# TODO: First time - Enable the Google Drive API https://developers.google.com/drive/api/guides/enable-drive-api
def is_folder_name_in_parents(service, parents, folder_name):
for parent_id in parents:
response = service.files().get(fileId=parent_id, fields='name').execute()
if folder_name == response.get("name"):
return parent_id
return None
def is_correct_file_path(service, folder_path, parents, root_folder_name, root_folder_id):
folder_name = os.path.basename(folder_path)
if folder_name == root_folder_name and root_folder_id in parents:
return True
parent_id = is_folder_name_in_parents(service=service, parents=parents, folder_name=folder_name)
if not parent_id:
return False
response = service.files().get(fileId=parent_id, fields='parents').execute()
new_parents = response.get("parents")
return is_correct_file_path(service=service,
folder_path=os.path.dirname(folder_path),
parents=new_parents,
root_folder_name=root_folder_name,
root_folder_id=root_folder_id)
def get_sharing_link_by_path(root_folder_name, root_folder_id, filepath):
# If modifying these scopes, delete the file token.json.
SCOPES = ['https://www.googleapis.com/auth/drive.metadata.readonly']
"""Shows basic usage of the Drive v3 API.
Prints the names and ids of the first 10 files the user has access to.
"""
creds = None
# The file token.json stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first time.
if os.path.exists('token.json'):
creds = Credentials.from_authorized_user_file('token.json', SCOPES)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file('credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.json', 'w') as token:
token.write(creds.to_json())
try:
service = build('drive', 'v3', credentials=creds)
filename = os.path.basename(filepath)
folder_path = os.path.dirname(filepath)
page_token = None
while True:
response = service.files().list(
q=f"name='{filename}'",
spaces='drive',
fields='nextPageToken, files(name, webViewLink, parents)',
pageToken=page_token
).execute()
print(f"There are {len(response.get('files', []))} results in Google Drive for: {filename}")
for file in response.get('files', []):
if "parents" in file.keys():
if is_correct_file_path(service=service,
folder_path=folder_path,
parents=file["parents"],
root_folder_name=root_folder_name,
root_folder_id=root_folder_id):
if 'webViewLink' in file.keys():
print(f"File path: {filename}\nWeb View Link: {file['webViewLink']}")
return file['webViewLink']
print(f"Web view link for this file not found: {filepath}")
return None
page_token = response.get('nextPageToken', None)
if page_token is None:
print(f"File not found: {filepath}")
return None
except HttpError as error:
# TODO(developer) - Handle errors from drive API.
print(f'An error occurred: {error}')

How to fetch name of all files in google drive using google drive api?

I am trying to fetch name of all the files in my google drive but the problem is that I can fetch only specific amount of file how can I remove this limitation and fetch all files
here's the code provided by google on their site
I am using google drive api v3
from __future__ import print_function
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/drive.metadata.readonly']
def main():
"""Shows basic usage of the Drive v3 API.
Prints the names and ids of the first 10 files the user has access to.
"""
creds = None
# The file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
creds = pickle.load(token)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.pickle', 'wb') as token:
pickle.dump(creds, token)
service = build('drive', 'v3', credentials=creds)
# Call the Drive v3 API
results = service.files().list(
pageSize=10, fields="nextPageToken, files(id, name)").execute()
items = results.get('files', [])
if not items:
print('No files found.')
else:
print('Files:')
for item in items:
print(u'{0} ({1})'.format(item['name'], item['id']))
if __name__ == '__main__':
main()
In this code there is a variable pageSize whose value is 10 and if I increase its value so I can fetch more files but I want to fetch all files so how can I do that.

Files.list has an optional paramter called pagesize
pageSize integer The maximum number of files to return per page. Partial or empty result pages are possible even before the end of the files list has been reached. Acceptable values are 1 to 1000, inclusive. (Default: 100)
You have set yours to pageSize=10 you should set it up to 1000 and then use the nextPageToken to select the next set of rows if there are more.
I am not a python developer the following code is a guess.
page_token = saved_start_page_token;
while page_token is not None:
response = service.changes().list(pageToken=page_token,
pageSize=1000, fields="nextPageToken, files(id, name)").execute()
page_token = response.get('nextPageToken')

How to download only files that are not in your computer from Google Drive using python

I'm trying to download files from my google drive folder using Python. I have tried the script below from How to download specific Google Drive folder using Python? and it works for me but I have lot of files in my google drive folder and would like to skip the files which already downloaded in my computer. Can I know is there anyway I can make it possible?
from __future__ import print_function
import pickle
import os
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from oauth2client import client
from oauth2client import tools
from oauth2client.file import Storage
from apiclient.http import MediaFileUpload, MediaIoBaseDownload
import io
from apiclient import errors
from apiclient import http
import logging
from apiclient import discovery
# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/drive']
# To list folders
def listfolders(service, filid, des):
results = service.files().list(
pageSize=1000, q="\'" + filid + "\'" + " in parents",
fields="nextPageToken, files(id, name, mimeType)").execute()
# logging.debug(folder)
folder = results.get('files', [])
for item in folder:
if str(item['mimeType']) == str('application/vnd.google-apps.folder'):
if not os.path.isdir(des+"/"+item['name']):
os.mkdir(path=des+"/"+item['name'])
print(item['name'])
listfolders(service, item['id'], des+"/"+item['name']) # LOOP un-till the files are found
else:
downloadfiles(service, item['id'], item['name'], des)
print(item['name'])
return folder
# To Download Files
def downloadfiles(service, dowid, name,dfilespath):
request = service.files().get_media(fileId=dowid)
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print("Download %d%%." % int(status.progress() * 100))
with io.open(dfilespath + "/" + name, 'wb') as f:
fh.seek(0)
f.write(fh.read())
def main():
"""Shows basic usage of the Drive v3 API.
Prints the names and ids of the first 10 files the user has access to.
"""
creds = None
# The file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
creds = pickle.load(token)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES) # credentials.json download from drive API
creds = flow.run_local_server()
# Save the credentials for the next run
with open('token.pickle', 'wb') as token:
pickle.dump(creds, token)
service = build('drive', 'v3', credentials=creds)
# Call the Drive v3 API
Folder_id = "'PAST YOUR SHARED FOLDER ID'" # Enter The Downloadable folder ID From Shared Link
results = service.files().list(
pageSize=1000, q=Folder_id+" in parents", fields="nextPageToken, files(id, name, mimeType)").execute()
items = results.get('files', [])
if not items:
print('No files found.')
else:
print('Files:')
for item in items:
if item['mimeType'] == 'application/vnd.google-apps.folder':
if not os.path.isdir("Folder"):
os.mkdir("Folder")
bfolderpath = os.getcwd()+"/Folder/"
if not os.path.isdir(bfolderpath+item['name']):
os.mkdir(bfolderpath+item['name'])
folderpath = bfolderpath+item['name']
listfolders(service, item['id'], folderpath)
else:
if not os.path.isdir("Folder"):
os.mkdir("Folder")
bfolderpath = os.getcwd()+"/Folder/"
if not os.path.isdir(bfolderpath + item['name']):
os.mkdir(bfolderpath + item['name'])
filepath = bfolderpath + item['name']
downloadfiles(service, item['id'], item['name'], filepath)
if __name__ == '__main__':
main()

I have tried modified the parts below, and its works! thank you all for stopping by~
def main():
"""Shows basic usage of the Drive v3 API.
Prints the names and ids of the first 10 files the user has access to.
"""
creds = None
# The file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
creds = pickle.load(token)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES) # credentials.json download from drive API
creds = flow.run_local_server()
# Save the credentials for the next run
with open('token.pickle', 'wb') as token:
pickle.dump(creds, token)
service = build('drive', 'v3', credentials=creds)
# Call the Drive v3 API
Folder_id = "'PAST YOUR SHARED FOLDER ID'" # Enter The Downloadable folder ID From Shared Link
File_Download_Path = "'LOCATION OF THE FILES DOWNLOADED'"
results = service.files().list(
pageSize=1000, q=Folder_id+" in parents", fields="nextPageToken, files(id, name, mimeType)").execute()
items = results.get('files', [])
if not items:
print('No files found.')
else:
print('Files:')
for item in items:
if not os.path.isdir(File_Download_Path):
os.mkdir(File_Download_Path)
if not item['name'] in os.listdir(File_Download_Path):
downloadfiles(service, item['id'], item['name'], File_Download_Path)
print("All files are downloaded.")
if __name__ == '__main__':
main()

I tired the above menthod, if there is any folders are present inside it is not downloading and getting the below error,
raise HttpError(resp, content, uri=self._uri)
googleapiclient.errors.HttpError: <HttpError 403 when requesting https://www.googleapis.com/drive/v3/files/1x7gHhVicv_fSI4t76Rh9pK6VOaBDRC6U?alt=media returned "Only files with binary content can be downloaded. Use Export with Docs Editors files.". Details: "[{'domain': 'global', 'reason': 'fileNotDownloadable', 'message': 'Only files with binary content can be downloaded. Use Export with Docs Editors files.', 'locationType': 'parameter', 'location': 'alt'}]">
Note: It downloads only files are there not folders

Better way to get access to Google Sheet with Python

How to automatize the process of getting access to Google spreadsheets?
Right now we use gspread and oauth2client.service_account to get an access to Google spreadsheet. It works fine, but using OAuth2 credentials makes us manually share every single spreadsheet to "client_email" from credentials json-file.
import gspread
from oauth2client.service_account import ServiceAccountCredentials
scope = [
'https://www.googleapis.com/auth/spreadsheets',
'https://www.googleapis.com/auth/drive'
]
credentials =
ServiceAccountCredentials.from_json_keyfile_name('path.json', scope)
gs = gspread.authorize(credentials)
That works, but how to modify?
So the desired outcome is: somebody shares a spreadsheet with me and I can start to work with it immediately in Python. Is it possible? Maybe we can use some triggers from incoming emails with the information about sharing or something similar?

You can try this script. It has a few sections we can differentiate:
Requesting access to Drive and Gmail. As you see we use the full
drive scope instead of drive.file. This is because there is an
existing bug that causes drive.file to crash(1), so meanwhile we have
to use this.
from __future__ import print_function
import pickle
import sys
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/drive',
'https://www.googleapis.com/auth/gmail.modify']
creds = None
# The file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
creds = pickle.load(token)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES)
creds = flow.run_local_server()
# Save the credentials for the next run
with open('token.pickle', 'wb') as token:
pickle.dump(creds, token)
mail_service = build('gmail', 'v1', credentials=creds)
drive_service = build('drive','v3', credentials=creds)
Declaring some variables. There is no issue here, just declaring the
variables where we will keep the mail ids, the file name and the
file name formatted for our needs.
mail_ids = []
file_name = []
name_string = []
Get the emails. We will only take the unread emails from
drive-shares-noreply. After this we will mark them as “read” so we
won’t take them the next time we execute the script.
def get_emails(mail_ids):
user_id = 'me' #Or your email
query = 'from:drive-shares-noreply#google.com, is:UNREAD' #Will search mails from drive shares and unread
response = mail_service.users().messages().list(userId=user_id, q=query).execute()
items = response.get('messages', [])
if not items:
print('No unread mails found')
sys.exit()
else:
for items in items:
mail_ids.append(items['id'])
for mail_ids in mail_ids:
mail_service.users().messages().modify(userId=user_id, id=mail_ids, body={"removeLabelIds":["UNREAD"]}).execute() #Marks the mails as read
Get the file names of the emails. The syntax of the Subject of the sharing sheets email is “Filename - Invitation to edit”, so we will take the subject of each email, and we will format the string later.
def get_filename(mail_ids, file_name):
user_id = 'me'
headers = []
for mail_ids in mail_ids:
response = mail_service.users().messages().get(userId=user_id, id=mail_ids, format="metadata", metadataHeaders="Subject").execute()
items = response.get('payload', [])
headers.append(items['headers'])
length = len(headers)
for i in range(length):
file_name.append(headers[i][0]['value'])
def process_name(file_name, name_string):
for file_name in file_name:
name_string.append(str(file_name).replace(" - Invitation to edit", ""))
Give permissions to the client_email
def give_permissions(name_string):
for name_string in name_string:
body = "'{}'".format(name_string)
results = drive_service.files().list(q="name = "+body).execute()
items = results.get('files', [])
if not items:
print('No files found.')
sys.exit()
else:
print('Files:')
for item in items:
print(u'{0} ({1})'.format(item['name'], item['id']))
file_id = item['id']
user_permission = {
'type': 'user',
'role': 'writer',
'emailAddress': 'your_client_email'
}
drive_service.permissions().create(body=user_permission, fileId=file_id).execute()
And then we just have to call the functions
get_emails(mail_ids)
get_filename(mail_ids, file_name)
process_name(file_name, name_string)
give_permissions(name_string)
There is no way to trigger this script for each new email received, but you can trigger it with a timer or something like that and it will search for new emails.
(1) The drive.file scope only works with certain files, according to the last update of the documentation

Not able to download google spreadsheet by google drive API using python

I am trying to download a spreadsheet file from my drive to my computer.
I am able to authenticate, get list of files and even get meta-data successfully.
But when I try to download the file, I get the following error :
downloading file starts
An error occurred: <HttpError 400 when requesting https://www.googleapis.com/dri
ve/v2/files/1vJetI_p8YEYiKvPVl0LtXGS5uIAx1eRGUupsXoh7UbI?alt=media returned "The
specified file does not support the requested alternate representation.">
downloading file ends
I couldn't get any such problem or question on SO and the other methods or solutions provided on SO for downloading the spreadsheet are outdated.Those have been deprecated by Google .
Here is the code, I am using to download the file :
import httplib2
import os
from apiclient import discovery
import oauth2client
from oauth2client import client
from oauth2client import tools
from apiclient import errors
from apiclient import http
try:
import argparse
flags = argparse.ArgumentParser(parents=[tools.argparser]).parse_args()
except ImportError:
flags = None
#SCOPES = 'https://www.googleapis.com/auth/drive.metadata.readonly'
SCOPES = 'https://www.googleapis.com/auth/drive'
CLIENT_SECRET_FILE = 'client_secrets.json'
APPLICATION_NAME = 'Drive API Quickstart'
def get_credentials():
home_dir = os.path.expanduser('~')
credential_dir = os.path.join(home_dir, '.credentials')
if not os.path.exists(credential_dir):
os.makedirs(credential_dir)
credential_path = os.path.join(credential_dir,
'drive-quickstart.json')
store = oauth2client.file.Storage(credential_path)
credentials = store.get()
if not credentials or credentials.invalid:
flow = client.flow_from_clientsecrets(CLIENT_SECRET_FILE, SCOPES)
flow.user_agent = APPLICATION_NAME
if flags:
credentials = tools.run_flow(flow, store, flags)
else: # Needed only for compatability with Python 2.6
credentials = tools.run(flow, store)
print 'Storing credentials to ' + credential_path
return credentials
def main():
credentials = get_credentials()
http = credentials.authorize(httplib2.Http())
service = discovery.build('drive', 'v2', http=http)
file_id = '1vJetI_p8YEYiKvPVl0LtXGS5uIAx1eRGUupsXoh7UbI'
print "downloading file starts"
download_file(service, file_id)
print "downloading file ends "
def download_file(service, file_id):
local_fd = open("foo.csv", "w+")
request = service.files().get_media(fileId=file_id)
media_request = http.MediaIoBaseDownload(local_fd, request)
while True:
try:
download_progress, done = media_request.next_chunk()
except errors.HttpError, error:
print 'An error occurred: %s' % error
return
if download_progress:
print 'Download Progress: %d%%' % int(download_progress.progress() * 100)
if done:
print 'Download Complete'
return
if __name__ == '__main__':
main()

Google spreadsheets don't have media. Instead they have exportLinks. Get the file metadata, then look in the exportlinks and pick an appropriate URL.

This code worked for me. I only had to download client_secret.json from google developers dashboard and keep in the same directory as python script.
And in the list_of_lists variable I got a list with each row as list.
import gspread
import json
from oauth2client.client import SignedJwtAssertionCredentials
json_key = json.load(open('client_secret.json'))
scope = ['https://spreadsheets.google.com/feeds']
credentials = SignedJwtAssertionCredentials(json_key['client_email'], json_key['private_key'], scope)
gc = gspread.authorize(credentials)
sht1 = gc.open_by_key('<id_of_sheet>')
worksheet_list = sht1.worksheets()
worksheet = sht1.sheet1
list_of_lists = worksheet.get_all_values()
for row in list_of_lists :
print row

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to loop over nextPageToken using GoogleDrive's Python Quickstart - python

Related

How can I get share link of file/folder by path to the file on insalled google drive in python?

How to fetch name of all files in google drive using google drive api?

How to download only files that are not in your computer from Google Drive using python

Better way to get access to Google Sheet with Python

Not able to download google spreadsheet by google drive API using python

Categories

Resources