Google Drive Python API: Uploading Large Files

Google Drive Python API: Uploading Large Files - python

I am writing a function to upload a file to Google Drive using the Python API client. It works for files up to 1 MB but does not work for a 10-MB file. When I try to upload a 10-MB file, I get an HTTP 400 error. Any help would be appreciated. Thanks.
Here is the output when I print the error:
An error occurred: <HttpError 400 when requesting https://www.googleapis.com/upload/drive/v3/files?alt=json&uploadType=resumable returned "Bad Request">
Here is the output when I print error.resp:
{'server': 'UploadServer',
'status': '400',
'x-guploader-uploadid': '...',
'content-type': 'application/json; charset=UTF-8',
'date': 'Mon, 26 Feb 2018 17:00:12 GMT',
'vary': 'Origin, X-Origin',
'alt-svc': 'hq=":443"; ma=2592000; quic=51303431; quic=51303339; quic=51303338; quic=51303337; quic=51303335,quic=":443"; ma=2592000; v="41,39,38,37,35"',
'content-length': '171'}
I'm unable to interpret this error. I have tried looking at the Google API Error Guide, but their explanation doesn't make sense to me, as all the parameters are the same as those in the requests with smaller files, which work.
Here is my code:
def insert_file_only(service, name, description, filename='', parent_id='root', mime_type=GoogleMimeTypes.PDF):
""" Insert new file.
Using documentation from Google Python API as a guide:
https://developers.google.com/api-client-library/python/guide/media_upload
Args:
service: Drive API service instance.
name: Name of the file to create, including the extension.
description: Description of the file to insert.
filename: Filename of the file to insert.
parent_id: Parent folder's ID.
mime_type: MIME type of the file to insert.
Returns:
Inserted file metadata if successful, None otherwise.
"""
# Set the file meta data
file_metadata = set_file_metadata(name, description, mime_type, parent_id)
# Create media with correct chunk size
if os.stat(filename).st_size <= 256*1024:
media = MediaFileUpload(filename, mimetype=mime_type, resumable=True)
else:
media = MediaFileUpload(filename, mimetype=mime_type, chunksize=256*1024, resumable=True)
file = None
status = None
start_from_beginning = True
num_temp_errors = 0
while file is None:
try:
if start_from_beginning:
# Start from beginning
logger.debug('Starting file upload')
file = service.files().create(body=file_metadata, media_body=media).execute()
else:
# Upload next chunk
logger.debug('Uploading next chunk')
status, file = service.files().create(
body=file_metadata, media_body=media).next_chunk()
if status:
logger.info('Uploaded {}%'.format(int(100*status.progress())))
except errors.HttpError as error:
logger.error('An error occurred: %s' % error)
logger.error(error.resp)
if error.resp.status in [404]:
# Start the upload all over again
start_from_beginning = True
elif error.resp.status in [500, 502, 503, 504]:
# Increment counter on number of temporary errors
num_temp_errors += 1
if num_temp_errors >= NUM_TEMP_ERROR_LIMIT:
return None
# Call next chunk again
else:
return None
permissions = assign_permissions(file, service)
return file
UPDATE
I tried using a simpler pattern, taking the advice from #StefanE. However, I still get an HTML 400 error for files over 1 MB. New code looks like this:
request = service.files().create(body=file_metadata, media_body=media)
response = None
while response is None:
status, response = request.next_chunk()
if status:
logger.info('Uploaded {}%'.format(int(100*status.progress()))
UPDATE 2
I found that the issue is conversion of the file into a Google Document, not uploading it. I'm trying to upload an HTML file and convert it into a Google Doc. This works for files less than ~2 MB. When I only upload the HTML file but not try to convert it, I don't get the abovementioned error. Looks like this corresponds with the limit on this page. I don't know if this limit can be increased.

I see some issues with your code.
First you have a while loop to continue as long file is None and the first thing you do is to set the value of file. i.e it will only loop once.
Secondly you got variable start_from_beginning but that is never set to False anywhere in the code, the else part of the statement will never be executed.
Looking at the Googles documentation their sample code looks a lot more straight forward:
media = MediaFileUpload('pig.png', mimetype='image/png', resumable=True)
request = farm.animals().insert(media_body=media, body={'name': 'Pig'})
response = None
while response is None:
status, response = request.next_chunk()
if status:
print "Uploaded %d%%." % int(status.progress() * 100)
print "Upload Complete!"
Here you loop on while response is None which will be None until finished with the upload.

Related

python google api v3 Error on update file

I try to use google drive api v3 in python to update file on google drive using code from official google instruction.
But i receive an Error:
The resource body includes fields which are not directly writable.
How it can be solved?
Here my code i try to use:
try:
# First retrieve the file from the API.
file = service.files().get(fileId='id_file_in_google_drive').execute()
# File's new metadata.
file['title'] = 'new_title'
file['description'] = 'new_description'
file['mimeType'] = 'application/pdf'
# File's new content.
media_body = MediaFileUpload(
'/home/my_file.pdf',
mimetype='application/pdf',
resumable=True)
# Send the request to the API.
updated_file = service.files().update(
fileId='id_file_in_google_drive',
body=file,
media_body=media_body).execute()
return updated_file
except errors:
print('An error occurred: %s')
return None

The issue is that you are using the same object as you got back from the files.get method. The File.update method uses HTTP PATCH methodology, this means that all parameters that you send are going to be updated. This object returned by file.get contains all of the fields for the file object. When you send it to the file.update method you are trying to update a lot of fields which are not updatable.
file = service.files().get(fileId='id_file_in_google_drive').execute()
# File's new metadata.
file['title'] = 'new_title'
file['description'] = 'new_description'
file['mimeType'] = 'application/pdf'
What you should do is create a new object, then update the file using this new object only updating the fields you want to update. Remember in Google Drive v3 its name not title.
file_metadata = {'name': 'new_title' , 'description': 'new description'}
updated_file = service.files().update(
fileId='id_file_in_google_drive',
body=file_metadata ,
media_body=media_body).execute()

Status parameter not working when using python blogger api

I'm trying to use google-api-python-client 1.12.5 with Service account auth under Python 3.8. It seems to me that the when specifying the status parameter, Google responds with a 404 HTTP code. I can't figure out why. I also looked in the docs but I can't relate anything to this error.
I have pasted my code. The error is happening in the third call.
This is the code:
from google.oauth2 import service_account
from googleapiclient.discovery import build
SCOPES = ['https://www.googleapis.com/auth/blogger']
SERVICE_ACCOUNT_FILE = 'new_service_account.json'
BLOG_ID = '<your_blog_id>'
credentials = service_account.Credentials.from_service_account_file(
SERVICE_ACCOUNT_FILE, scopes=SCOPES)
service = build('blogger', 'v3', credentials=credentials)
p = service.posts()
# FIRST
promise = p.list(blogId=BLOG_ID)
result = promise.execute()
# SECOND
promise = p.list(blogId=BLOG_ID, orderBy='UPDATED')
result = promise.execute()
#THIRD
promise = p.list(blogId=BLOG_ID, orderBy='UPDATED', status='DRAFT')
result = promise.execute() # <===== ERROR HAPPENS HERE!!!!
service.close()
And this is the traceback:
Traceback (most recent call last):
File "/home/madtyn/.local/share/JetBrains/Toolbox/apps/PyCharm-P/ch-0/202.7660.27/plugins/python/helpers/pydev/pydevd.py", line 1448, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/madtyn/.local/share/JetBrains/Toolbox/apps/PyCharm-P/ch-0/202.7660.27/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/madtyn/PycharmProjects/blogger/main.py", line 24, in <module>
result = promise.execute()
File "/home/madtyn/venvs/blogger/lib/python3.8/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
return wrapped(*args, **kwargs)
File "/home/madtyn/venvs/blogger/lib/python3.8/site-packages/googleapiclient/http.py", line 915, in execute
raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 404 when requesting https://blogger.googleapis.com/v3/blogs/<blog_id>/posts?orderBy=UPDATED&status=DRAFT&alt=json returned "Not Found">
python-BaseException

I can reproduce this issue... Adding status=DRAFT will return 404 but any other filter is working...
Tried with service account and your code: 404
Tried with API Key like this result = requests.get('https://blogger.googleapis.com/v3/blogs/<blog_id>/posts?status=DRAFT&orderBy=UPDATED&alt=json&key=<api_key>'): 404
Extracted "access_token" from service account (credentials.token after a call): result = requests.get('https://blogger.googleapis.com/v3/blogs/<blog_id>/posts?status=DRAFT&orderBy=UPDATED&alt=json&access_token=<extracted_service_account_token>'): 404
But very strangely if I use access_token given by "Try this API" here : https://developers.google.com/blogger/docs/3.0/reference/posts/list?apix_params={"blogId"%3A"blog_id"%2C"orderBy"%3A"UPDATED"%2C"status"%3A["DRAFT"]%2C"alt"%3A"json"} it's works !
Used that token with requests give me my blog post in draft status...
Just copy/paste raw Authorization header inside that script:
import requests
blog_id = '<blog_id>'
headers = {
'Authorization' : 'Bearer <replace_here>'
}
# Using only Authorization header
result = requests.get(
'https://blogger.googleapis.com/v3/blogs/%s/posts?status=DRAFT&orderBy=UPDATED&alt=json' % (blog_id),
headers=headers
)
print(result)
# This should print DRAFT if you have at least one draft post
print(result.json()['items'][0]['status'])
# Using "access_token" param constructed with Authorization header splited to have only token
result = requests.get('https://blogger.googleapis.com/v3/blogs/%s/posts?status=DRAFT&orderBy=UPDATED&alt=json&access_token=%s' % (blog_id, headers['Authorization'][len('Bearer '):]))
print(result)
# This should print DRAFT if you have at least one draft post
print(result.json()['items'][0]['status'])
Results I have currently:
The bug doesn't seem to come from the library but rather from the token rights...However I also used the console normally to generate accesses like you.
To conclude I think it's either a bug or it's voluntary from Google... I don't know how long the "Try this API" token is valid but it is currently the only way I found to get the draft articles... Maybe you can try to open a bug ticket but I don't know specifically where it is possible to do that.

Explaination on error viewing status of Google Drive API v3 upload using next_chunk() in Python?

I am trying to do a resumable upload to google drive using the v3 api. I want to be able to display a status bar of the upload. I can get it to upload easily and quickly if I do not need a status bar because I can use the .execute() function and it uploads. The problems arise when I want to upload the files in chunks. I've seen a few solutions to this on here and other places, but they don't seem to work.
This is my code for uploading:
CHUNK_SIZE = 256*1024
file_metadata = {'name': file_name, 'parents': [folder_id]} #Metadata for the file we are going to upload
media = MediaFileUpload(file_path, mimetype='application/zip',chunksize=CHUNK_SIZE, resumable=True)
file = service.files().create(body=file_metadata, media_body=media, fields='id')
progress = progressBarUpload(file) #create instance off progress bar class
progress.exec_() #execute it
progress.hide() #hide it off the screen after
print(file_name + " uploaded successfully")
return 1 #returns 1 if it was successful
The progress bar calls a thread for my gui which then uses the next_chunk() function, this code is here:
signal = pyqtSignal(int)
def __init__(self, file):
super(ThreadUpload,self).__init__()
self.file = file
def run(self):
done = False
while done == False:
status, done = self.file.next_chunk()
print("status->",status)
if status:
value = int(status.progress() * 100)
print("Uploaded",value,"%")
self.signal.emit(value)
The problem I am getting is that my status = None.
If I use this code it works correctly, but I cannot view the status of the upload using this method. There is a .execute() added which makes it work. I get rid of the next_chunk() part when doing it this way:
CHUNK_SIZE = 256*1024
file_metadata = {'name': file_name, 'parents': [folder_id]} #Metadata for the file we are going to upload
media = MediaFileUpload(file_path, mimetype='application/zip',chunksize=CHUNK_SIZE, resumable=True)
file = service.files().create(body=file_metadata, media_body=media, fields='id').execute()
The first method doesn't work whether I use it in the progress bar thread or not, the second method works both ways every time. I use the progress bar to view the status for downloads and a few other things and it works very well, so I'm pretty confident its the fact my status = None when downloading that is the problem.
Thanks for the help in advance.

The problem is you're comparing the response from the request (done variable in your case) with False, this condition will never return True because the response is either None or an object with the object ID once the upload procces has finished. This is the code I tested and worked succesfully:
CHUNK_SIZE = 256 * 1024
file_metadata = {'name': "Test resumable"} # Metadata for the file we are going to upload
media = MediaFileUpload("test-image.jpg", mimetype='image/jpeg', chunksize=CHUNK_SIZE, resumable=True)
request = service.files().create(body=file_metadata, media_body=media, fields='id')
response = None
while response is None:
status, response = request.next_chunk()
if status:
print(status.progress())
print(response)
print("uploaded successfully")
For your case, you could change these 2 lines:
done = False
while done == False:
For this:
done = None
while done is None:

Google Client API v3 - update a file on drive using Python

I'm trying to update the content of a file from a python script using the google client api. The problem is that I keep receiving error 403:
An error occurred: <HttpError 403 when requesting https://www.googleapis.com /upload/drive/v3/files/...?alt=json&uploadType=resumable returned "The resource body includes fields which are not directly writable.
I have tried to remove metadata fields, but didn't help.
The function to update the file is the following:
# File: utilities.py
from googleapiclient import errors
from googleapiclient.http import MediaFileUpload
from googleapiclient.discovery import build
from httplib2 import Http
from oauth2client import file, client, tools
def update_file(service, file_id, new_name, new_description, new_mime_type,
new_filename):
"""Update an existing file's metadata and content.
Args:
service: Drive API service instance.
file_id: ID of the file to update.
new_name: New name for the file.
new_description: New description for the file.
new_mime_type: New MIME type for the file.
new_filename: Filename of the new content to upload.
new_revision: Whether or not to create a new revision for this file.
Returns:
Updated file metadata if successful, None otherwise.
"""
try:
# First retrieve the file from the API.
file = service.files().get(fileId=file_id).execute()
# File's new metadata.
file['name'] = new_name
file['description'] = new_description
file['mimeType'] = new_mime_type
file['trashed'] = True
# File's new content.
media_body = MediaFileUpload(
new_filename, mimetype=new_mime_type, resumable=True)
# Send the request to the API.
updated_file = service.files().update(
fileId=file_id,
body=file,
media_body=media_body).execute()
return updated_file
except errors.HttpError as error:
print('An error occurred: %s' % error)
return None
And here there is the whole script to reproduce the problem.
The goal is to substitute a file, retrieving its id by name.
If the file does not exist yet, the script will create it by calling insert_file (this function works as expected).
The problem is update_file, posted above.
from __future__ import print_function
from utilities import *
from googleapiclient import errors
from googleapiclient.http import MediaFileUpload
from googleapiclient.discovery import build
from httplib2 import Http
from oauth2client import file, client, tools
def get_authenticated(SCOPES, credential_file='credentials.json',
token_file='token.json', service_name='drive',
api_version='v3'):
# The file token.json stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
store = file.Storage(token_file)
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets(credential_file, SCOPES)
creds = tools.run_flow(flow, store)
service = build(service_name, api_version, http=creds.authorize(Http()))
return service
def retrieve_all_files(service):
"""Retrieve a list of File resources.
Args:
service: Drive API service instance.
Returns:
List of File resources.
"""
result = []
page_token = None
while True:
try:
param = {}
if page_token:
param['pageToken'] = page_token
files = service.files().list(**param).execute()
result.extend(files['files'])
page_token = files.get('nextPageToken')
if not page_token:
break
except errors.HttpError as error:
print('An error occurred: %s' % error)
break
return result
def insert_file(service, name, description, parent_id, mime_type, filename):
"""Insert new file.
Args:
service: Drive API service instance.
name: Name of the file to insert, including the extension.
description: Description of the file to insert.
parent_id: Parent folder's ID.
mime_type: MIME type of the file to insert.
filename: Filename of the file to insert.
Returns:
Inserted file metadata if successful, None otherwise.
"""
media_body = MediaFileUpload(filename, mimetype=mime_type, resumable=True)
body = {
'name': name,
'description': description,
'mimeType': mime_type
}
# Set the parent folder.
if parent_id:
body['parents'] = [{'id': parent_id}]
try:
file = service.files().create(
body=body,
media_body=media_body).execute()
# Uncomment the following line to print the File ID
# print 'File ID: %s' % file['id']
return file
except errors.HttpError as error:
print('An error occurred: %s' % error)
return None
# If modifying these scopes, delete the file token.json.
SCOPES = 'https://www.googleapis.com/auth/drive'
def main():
service = get_authenticated(SCOPES)
# Call the Drive v3 API
results = retrieve_all_files(service)
target_file_descr = 'Description of deploy.py'
target_file = 'deploy.py'
target_file_name = target_file
target_file_id = [file['id'] for file in results if file['name'] == target_file_name]
if len(target_file_id) == 0:
print('No file called %s found in root. Create it:' % target_file_name)
file_uploaded = insert_file(service, target_file_name, target_file_descr, None,
'text/x-script.phyton', target_file_name)
else:
print('File called %s found. Update it:' % target_file_name)
file_uploaded = update_file(service, target_file_id[0], target_file_name, target_file_descr,
'text/x-script.phyton', target_file_name)
print(str(file_uploaded))
if __name__ == '__main__':
main()
In order to try the example, is necessary to create a Google Drive API from https://console.developers.google.com/apis/dashboard,
then save the file credentials.js and pass its path to get_authenticated(). The file token.json will be created after the first
authentication and API authorization.

The problem is that the metadata 'id' can not be changed when updating a file, so it should not be in the body. Just delete it from the dict:
# File's new metadata.
del file['id'] # 'id' has to be deleted
file['name'] = new_name
file['description'] = new_description
file['mimeType'] = new_mime_type
file['trashed'] = True
I tried your code with this modification and it works

I also struggled a little bit with the function and found if you don't have to update the metadata then just remove them in the update function like :updated_file = service.files().update(fileId=file_id, media_body=media_body).execute()
At Least that worked for me

The problem is The resource body includes fields which are not directly writable. So try removing all of the metadata properties and then add them back one by one. The one I would be suspicious about is trashed. Even though the API docs say this is writable, it shouldn't be. Trashing a file has side effects beyond setting a boolean. Updating a file and setting it to trashed at the same time is somewhat unusual. Are you sure that's what you intend?

SharePlum error : "Can't get User Info List"

I'm trying to use SharePlum which is a Python module for SharePoint but when I try to connect to my SharePoint, SharePlum raises me this error:
Traceback (most recent call last):
File "C:/Users/me/Desktop/Sharpoint/sharpoint.py", line 13, in site = Site(sharepoint_url, auth=auth)
File "C:\Users\me\AppData\Local\Programs\Python\Python36\lib\site-packages\shareplum\shareplum.py", line 46, in init self.users = self.GetUsers()
File "C:\Users\me\AppData\Local\Programs\Python\Python36\lib\site-packages\shareplum\shareplum.py", line 207, in GetUsers raise Exception("Can't get User Info List")
Exception: Can't get User Info List
Here is the very short code that I have written:
auth = HttpNtlmAuth(username, password)
site = Site(sharepoint_url, auth=auth)
This error seems to indicate bad username/password but I'm pretty sure that the one I have are correct...

Ok, it seems that I found the solution for my problem, it's about the Sharepoint URL that I gave.
If we take this example : https://www.mysharepoint.com/Your/SharePoint/DocumentLibrary
You have to remove the last part : /DocumentLibrary.
Why remove this part precisely ?
In fact, when you go deep enough in your Sharepoint, your url will look like something like : https://www.mysharepoint.com/Your/SharePoint/DocumentLibrary/Forms/AllItems.aspx?RootFolder=%2FYour%2FSharePoint%2DocumentLibrary%2FmyPersonnalFolder&FolderCTID=0x0120008BBC54784D92004D1E23F557873CC707&View=%7BE149526D%2DFD1B%2D4BFA%2DAA46%2D90DE0770F287%7D
You can see that the right of the path is in RootFolder=%2FYour%2FSharePoint%2DocumentLibrary%2Fmy%20personnal%20folder and not in the "normal" URL anymore (if it were, it will be like that https://www.mysharepoint.com/Your/SharePoint/DocumentLibrary/myPersonnalFolder/).
What you have to remove is the end of the "normal" URL so in this case, /DocumentLibrary.
So my correct Sharepoint URL to input in SharePlum will be https://www.mysharepoint.com/Your/SharePoint/
I'm pretty new to Sharepoint so I'm not really sure that this I the right answer to this problem for the others persons, may someone who know Sharepoint better than me can confirm ?

I know this is not actual solution for your problem and I would add just comment but it was too long so I will post as answer.
I can't replicate your issue, but by looking into source code of shareplum.py you can see why program throws the error. In line 196 of shareplum.py there is if clause (if response.status_code == 200:) which checks if the request to access your sharepoint url was successful (than it has status code 200) and if request failed (than it has some other status code) than it throws exception (Can't get User Info List). If you want to find out more about your problem go to your shareplum.py file ("C:\Users\me\AppData\Local\Programs\Python\Python36\lib\site-packages\shareplum\shareplum.py") and add this line print('{} {} Error: {} for url: {}'.format(response.status_code, 'Client'*(400 <= response.status_code < 500) + 'Server'*(500 <= response.status_code < 600), response.reason, response.url)) before line 207 ('raise Exception("Can't get User Info List")'). Then your shareplum.py should look like this:
# Parse Response
if response.status_code == 200:
envelope = etree.fromstring(response.text.encode('utf-8'))
listitems = envelope[0][0][0][0][0]
data = []
for row in listitems:
# Strip the 'ows_' from the beginning with key[4:]
data.append({key[4:]: value for (key, value) in row.items() if key[4:]})
return {'py': {i['ImnName']: i['ID']+';#'+i['ImnName'] for i in data},
'sp': {i['ID']+';#'+i['ImnName'] : i['ImnName'] for i in data}}
else:
print('{} {} Error: {} for url: {}'.format(response.status_code, 'Client'*(400 <= response.status_code < 500) + 'Server'*(500 <= response.status_code < 600), response.reason, response.url))
raise Exception("Can't get User Info List")
Now just run your program again and it should print out why it isn't working.
I know it is best not to change files in Python modules, but if you know what you change then there is no problem so when you are finished just delete the added line.
Also when you find out status code you can search it online, just type it in google or search on List_of_HTTP_status_codes.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Google Drive Python API: Uploading Large Files - python

Related

python google api v3 Error on update file

Status parameter not working when using python blogger api

Explaination on error viewing status of Google Drive API v3 upload using next_chunk() in Python?

Google Client API v3 - update a file on drive using Python

SharePlum error : "Can't get User Info List"

Categories

Resources