How to download a file with python-google-api

How to download a file with python-google-api - python

How would I download a file using the GoogleAPI? Here is what I have so far:
CLIENT_ID = '255556'
CLIENT_SECRET = 'y8sR1'
DOCUMENT_ID = 'a123'
service=build('drive', 'v2')
# How to do the following line?
service.get_file(CLIENT_ID, CLIENT_SECRET, DOCUMENT_ID)

There are different ways to download a file using Google Drive API. It depends on whether you are downloading a normal file or a google document (that needs to be exporteed in a specific format).
for regular files stored in drive, you can either use:
alt=media and it's the preferred option, as in:
GET https://www.googleapis.com/drive/v2/files/0B9jNhSvVjoIVM3dKcGRKRmVIOVU?alt=media
Authorization: Bearer ya29.AHESVbXTUv5mHMo3RYfmS1YJonjzzdTOFZwvyOAUVhrs
the other method is to use DownloadUrl, as in:
from apiclient import errors
# ...
def download_file(service, drive_file):
"""Download a file's content.
Args:
service: Drive API service instance.
drive_file: Drive File instance.
Returns:
File's content if successful, None otherwise.
"""
download_url = drive_file.get('downloadUrl')
if download_url:
resp, content = service._http.request(download_url)
if resp.status == 200:
print 'Status: %s' % resp
return content
else:
print 'An error occurred: %s' % resp
return None
else:
# The file doesn't have any content stored on Drive.
return None
For google documents, instead of using downloadUrl, you need to use exportLinks and specify the mime type, for example:
download_url = file['exportLinks']['application/pdf']
The rest of the documentation can be found here:
https://developers.google.com/drive/web/manage-downloads

Related

How do I directly save images from a url to my aws bucket (python / boto3)? [duplicate]

I'm working in a Python web environment and I can simply upload a file from the filesystem to S3 using boto's key.set_contents_from_filename(path/to/file). However, I'd like to upload an image that is already on the web (say https://pbs.twimg.com/media/A9h_htACIAAaCf6.jpg:large).
Should I somehow download the image to the filesystem, and then upload it to S3 using boto as usual, then delete the image?
What would be ideal is if there is a way to get boto's key.set_contents_from_file or some other command that would accept a URL and nicely stream the image to S3 without having to explicitly download a file copy to my server.
def upload(url):
try:
conn = boto.connect_s3(settings.AWS_ACCESS_KEY_ID, settings.AWS_SECRET_ACCESS_KEY)
bucket_name = settings.AWS_STORAGE_BUCKET_NAME
bucket = conn.get_bucket(bucket_name)
k = Key(bucket)
k.key = "test"
k.set_contents_from_file(url)
k.make_public()
return "Success?"
except Exception, e:
return e
Using set_contents_from_file, as above, I get a "string object has no attribute 'tell'" error. Using set_contents_from_filename with the url, I get a No such file or directory error . The boto storage documentation leaves off at uploading local files and does not mention uploading files stored remotely.

Here is how I did it with requests, the key being to set stream=True when initially making the request, and uploading to s3 using the upload.fileobj() method:
import requests
import boto3
url = "https://upload.wikimedia.org/wikipedia/en/a/a9/Example.jpg"
r = requests.get(url, stream=True)
session = boto3.Session()
s3 = session.resource('s3')
bucket_name = 'your-bucket-name'
key = 'your-key-name' # key is the name of file on your bucket
bucket = s3.Bucket(bucket_name)
bucket.upload_fileobj(r.raw, key)

Ok, from #garnaat, it doesn't sound like S3 currently allows uploads by url. I managed to upload remote images to S3 by reading them into memory only. This works.
def upload(url):
try:
conn = boto.connect_s3(settings.AWS_ACCESS_KEY_ID, settings.AWS_SECRET_ACCESS_KEY)
bucket_name = settings.AWS_STORAGE_BUCKET_NAME
bucket = conn.get_bucket(bucket_name)
k = Key(bucket)
k.key = url.split('/')[::-1][0] # In my situation, ids at the end are unique
file_object = urllib2.urlopen(url) # 'Like' a file object
fp = StringIO.StringIO(file_object.read()) # Wrap object
k.set_contents_from_file(fp)
return "Success"
except Exception, e:
return e
Also thanks to How can I create a GzipFile instance from the “file-like object” that urllib.urlopen() returns?

For a 2017-relevant answer to this question which uses the official 'boto3' package (instead of the old 'boto' package from the original answer):
Python 3.5
If you're on a clean Python install, pip install both packages first:
pip install boto3
pip install requests
import boto3
import requests
# Uses the creds in ~/.aws/credentials
s3 = boto3.resource('s3')
bucket_name_to_upload_image_to = 'photos'
s3_image_filename = 'test_s3_image.png'
internet_image_url = 'https://docs.python.org/3.7/_static/py.png'
# Do this as a quick and easy check to make sure your S3 access is OK
for bucket in s3.buckets.all():
if bucket.name == bucket_name_to_upload_image_to:
print('Good to go. Found the bucket to upload the image into.')
good_to_go = True
if not good_to_go:
print('Not seeing your s3 bucket, might want to double check permissions in IAM')
# Given an Internet-accessible URL, download the image and upload it to S3,
# without needing to persist the image to disk locally
req_for_image = requests.get(internet_image_url, stream=True)
file_object_from_req = req_for_image.raw
req_data = file_object_from_req.read()
# Do the actual upload to s3
s3.Bucket(bucket_name_to_upload_image_to).put_object(Key=s3_image_filename, Body=req_data)

Unfortunately, there really isn't any way to do this. At least not at the moment. We could add a method to boto, say set_contents_from_url, but that method would still have to download the file to the local machine and then upload it. It might still be a convenient method but it wouldn't save you anything.
In order to do what you really want to do, we would need to have some capability on the S3 service itself that would allow us to pass it the URL and have it store the URL to a bucket for us. That sounds like a pretty useful feature. You might want to post that to the S3 forums.

A simple 3-lines implementation that works on a lambda out-of-the-box:
import boto3
import requests
s3_object = boto3.resource('s3').Object(bucket_name, object_key)
with requests.get(url, stream=True) as r:
s3_object.put(Body=r.content)
The source for the .get part comes straight from the requests documentation

from io import BytesIO
def send_image_to_s3(url, name):
print("sending image")
bucket_name = 'XXX'
AWS_SECRET_ACCESS_KEY = "XXX"
AWS_ACCESS_KEY_ID = "XXX"
s3 = boto3.client('s3', aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY)
response = requests.get(url)
img = BytesIO(response.content)
file_name = f'path/{name}'
print('sending {}'.format(file_name))
r = s3.upload_fileobj(img, bucket_name, file_name)
s3_path = 'path/' + name
return s3_path

I have tried as following with boto3 and it works me:
import boto3;
import contextlib;
import requests;
from io import BytesIO;
s3 = boto3.resource('s3');
s3Client = boto3.client('s3')
for bucket in s3.buckets.all():
print(bucket.name)
url = "#resource url";
with contextlib.closing(requests.get(url, stream=True, verify=False)) as response:
# Set up file stream from response content.
fp = BytesIO(response.content)
# Upload data to S3
s3Client.upload_fileobj(fp, 'aws-books', 'reviews_Electronics_5.json.gz')

Using the boto3 upload_fileobj method, you can stream a file to an S3 bucket, without saving to disk. Here is my function:
import boto3
import StringIO
import contextlib
import requests
def upload(url):
# Get the service client
s3 = boto3.client('s3')
# Rember to se stream = True.
with contextlib.closing(requests.get(url, stream=True, verify=False)) as response:
# Set up file stream from response content.
fp = StringIO.StringIO(response.content)
# Upload data to S3
s3.upload_fileobj(fp, 'my-bucket', 'my-dir/' + url.split('/')[-1])

S3 doesn't support remote upload as of now it seems. You may use the below class for uploading an image to S3. The upload method here first tries to download the image and keeps it in memory for sometime until it gets uploaded. To be able to connect to S3 you will have to install AWS CLI using command pip install awscli, then enter few credentials using command aws configure:
import urllib3
import uuid
from pathlib import Path
from io import BytesIO
from errors import custom_exceptions as cex
BUCKET_NAME = "xxx.yyy.zzz"
POSTERS_BASE_PATH = "assets/wallcontent"
CLOUDFRONT_BASE_URL = "https://xxx.cloudfront.net/"
class S3(object):
def __init__(self):
self.client = boto3.client('s3')
self.bucket_name = BUCKET_NAME
self.posters_base_path = POSTERS_BASE_PATH
def __download_image(self, url):
manager = urllib3.PoolManager()
try:
res = manager.request('GET', url)
except Exception:
print("Could not download the image from URL: ", url)
raise cex.ImageDownloadFailed
return BytesIO(res.data) # any file-like object that implements read()
def upload_image(self, url):
try:
image_file = self.__download_image(url)
except cex.ImageDownloadFailed:
raise cex.ImageUploadFailed
extension = Path(url).suffix
id = uuid.uuid1().hex + extension
final_path = self.posters_base_path + "/" + id
try:
self.client.upload_fileobj(image_file,
self.bucket_name,
final_path
)
except Exception:
print("Image Upload Error for URL: ", url)
raise cex.ImageUploadFailed
return CLOUDFRONT_BASE_URL + id

import boto
from boto.s3.key import Key
from boto.s3.connection import OrdinaryCallingFormat
from urllib import urlopen
def upload_images_s3(img_url):
try:
connection = boto.connect_s3('access_key', 'secret_key', calling_format=OrdinaryCallingFormat())
bucket = connection.get_bucket('boto-demo-1519388451')
file_obj = Key(bucket)
file_obj.key = img_url.split('/')[::-1][0]
fp = urlopen(img_url)
result = file_obj.set_contents_from_string(fp.read())
except Exception, e:
return e

How can I upload the file to a desired folder in sharepoint using Python?

I am able to upload file to Documents in SharePoint. But I want to upload it to a specific Folder. I am not sure how can I do that. Any help will be welcomed. I am using the code below to upload files.
def upload_file(ctx, listTitle, path):
list_obj = ctx.web.lists.get_by_title(listTitle)
folder = list_obj.root_folder
ctx.load(folder)
ctx.execute_query()
files = folder.files
ctx.load(files)
ctx.execute_query()
with open(path, 'rb') as f:
content = f.read()
file_creation_information = FileCreationInformation()
file_creation_information.overwrite = True
file_creation_information.url = os.path.basename(path)
file_creation_information.content = content
file_new = files.add(file_creation_information)
ctx.load(files)
ctx.execute_query()
upload_file(ctx,'/Documents/reports/',path)
The code above works fine for upload_file(ctx,'Documents',report)
but doesn't work for upload_file(ctx,'Documents/reports',report) folder. It doesn't work.
Error:
office365.runtime.client_request_exception.ClientRequestException: ('-1, System.ArgumentException', "List 'reports' does not exist at site with URL 'https://sharepoint.com/sites/my_page'.", "404 Client Error: Not Found for url: https://sharepoint.com/sites/my_page/_api/Web/lists/GetByTitle('reports')/RootFolder")

The error you're getting is suggesting that you somehow look for a list called reports. But you want to get a folder-object.
It's not clear which package you're using to connect to sharepoint but the docs say that you need to make an api-call like this:
POST https://{site_url}/_api/web/GetFolderByServerRelativeUrl('/Folder Name')/Files/add(url='a.txt',overwrite=true)
Authorization: "Bearer " + accessToken
Content-Length: {length of request body as integer}
X-RequestDigest: "{form_digest_value}"
"Contents of file"
Uploading a file to the Document directory is working because it's a folder inside the the default directory Shared Documents. You can only access folders inside root with the .root_foldercalls.
Edit 2: As I don't know which package you're using (probably Office365-REST) you can upload a file using the endpoint above in a normal requests.post-function.
Should look like this:
import requests
url = f'https://{site_url}/_api/web/GetFolderByServerRelativeUrl('{specific Folder URL}')/Files/add(url='a.txt',overwrite=true)'
headers = {Authorization: "Bearer " + accessToken,
Content-Length: content-length
}
payload = your_byte_string
response = requests.post(url = url, headers=headers, data = payload}

'Response headers must contain header', u'location'

I try to upload a video in a Google Cloud Storage bucket by using resumable upload.
But I always have the same error : (u'Response headers must contain header', u'location')
Here is my code:
client = _get_storage_client()
bucket = client.bucket(BUCKET_NAME, PROJECT_ID)
blob = bucket.blob(filename)
if 'video' in content_type:
url = blob.create_resumable_upload_session(content_type=content_type, client=client)
stream = io.BytesIO(stream_file.file.read())
upload = ResumableUpload(
upload_url=url,
chunk_size=chunk_size
)
transport = AuthorizedSession(credentials=client._credentials)
# Start using the Resumable Upload
response = upload.initiate(
transport=transport,
content_type=content_type,
stream=stream,
metadata={'name': blob.name}
)
while upload.finished is False:
upload.transmit_next_chunk(transport)
The error appear at the upload.initiate()

your problem maybe are in
url = blob.create_resumable_upload_session(content_type=content_type,
client=client)
check the post here, they use
# Create a Resumable Upload
url = (
f'https://www.googleapis.com/upload/storage/v1/b/'
f'{bucket.name}/o?uploadType=resumable'
)

Your problem most probably has to with authorization. The problem here is that the line
response = upload.initiate(
transport=transport,
content_type=content_type,
stream=stream,
metadata={'name': blob.name}
)
does not contain the google cloud response.
I would advise you to debug this statement if you step into this statement you will find
method, url, payload, headers = self._prepare_initiate_request(
stream, metadata, content_type,
total_bytes=total_bytes, stream_final=stream_final)
result = _helpers.http_request(
transport, method, url, data=payload, headers=headers,
retry_strategy=self._retry_strategy)
self._process_initiate_response(result)
return result
If you inspect the 'result' variable. It will provide you with the HTTP status code (403 for non authorized). The content of the result will provide you with the reason and the access right that is required.
Another possibility is to send your request through a proxy and inspect the HTTP result.

Flask proxy response for file download

I am using a Flask route as a proxy to download a file, like this:
#esa_handler.route("/data/<int:series>/<int:file_num>", methods=["GET"])
def DownloadRemote(series, file_num):
"""
Downloads the remote files from the ESA.
:param series: 0-20.
:param file_num: File within the series, 0-255
:return: Compressed CSV file.
"""
# if the file is bad.
if series >= 20 and file_num > 110:
return jsonify({"error": "file does not exist."})
url = "http://cdn.gea.esac.esa.int/Gaia/gaia_source/csv/GaiaSource_000-{:03d}-{:03d}.csv.gz".format(series,
file_num)
req = requests.get(url, stream=True)
return Response(stream_with_context(req.iter_content(chunk_size=2048)), content_type=req.headers["content-type"])
It works fine, however, the filename that is presented to the client is whatever the file number that is passed to the endpoint. For example, if I put http://127.0.0.1:5000/esa/data/0/0 to download the very first file, it downloads, but Chrome/Firefox/IE/Edge are offering to save the file with a filename as "0". While there is nothing wrong with that, I would like a better user experience.
How can I intercept the response to proffer a filename based off the URL requested?

This can be done with the Content-Disposition HTTP header. Here you can specify a filename for the newly downloaded file.
This can be added to a Flask Response as follows:
url = "http://cdn.gea.esac.esa.int/Gaia/gaia_source/csv/GaiaSource_000-{:03d}-{:03d}.csv.gz".format(series,
req = requests.get(url, stream=True)
headers = Headers()
headers .add('Content-Type', req.headers["content-type"])
headers .add('Content-Disposition', 'attachment; filename="filename.txt"')
return Response(stream_with_context(req.iter_content(chunk_size=2048)), headers=headers)
Note: The Content-Type was moved into headers for simplicity

How to use a refresh_token to get a new access_token (using Flask-OAuthLib)?

I'm building a website + backend with the FLask Framework in which I use Flask-OAuthlib to authenticate with google. After authentication, the backend needs to regularly scan the user his Gmail. So currently users can authenticate my app and I store the access_token and the refresh_token. The access_token expires after one hour, so within that one hour I can get the userinfo like so:
google = oauthManager.remote_app(
'google',
consumer_key='xxxxxxxxx.apps.googleusercontent.com',
consumer_secret='xxxxxxxxx',
request_token_params={
'scope': ['https://www.googleapis.com/auth/userinfo.email', 'https://www.googleapis.com/auth/gmail.readonly'],
'access_type': 'offline'
},
base_url='https://www.googleapis.com/oauth2/v1/',
request_token_url=None,
access_token_method='POST',
access_token_url='https://accounts.google.com/o/oauth2/token',
authorize_url='https://accounts.google.com/o/oauth2/auth'
)
token = (the_stored_access_token, '')
userinfoObj = google.get('userinfo', token=token).data
userinfoObj['id'] # Prints out my google id
Once the hour is over, I need to use the refresh_token (which I've got stored in my database) to request a new access_token. I tried replacing the_stored_access_token with the_stored_refresh_token, but this simply gives me an Invalid Credentials-error.
In this github issue I read the following:
regardless of how you obtained the access token / refresh token (whether through an authorization code grant or resource owner password credentials), you exchange them the same way, by passing the refresh token as refresh_token and grant_type set to 'refresh_token'.
From this I understood I had to create a remote app like so:
google = oauthManager.remote_app(
'google',
# also the consumer_key, secret, request_token_params, etc..
grant_type='refresh_token',
refresh_token=u'1/xK_ZIeFn9quwvk4t5VRtE2oYe5yxkRDbP9BQ99NcJT0'
)
But this leads to a TypeError: __init__() got an unexpected keyword argument 'refresh_token'. So from here I'm kinda lost.
Does anybody know how I can use the refresh_token to get a new access_token? All tips are welcome!

This is how I get a new access_token for google:
from urllib2 import Request, urlopen, URLError
from webapp2_extras import json
import mimetools
BOUNDARY = mimetools.choose_boundary()
def refresh_token()
url = google_config['access_token_url']
headers = [
("grant_type", "refresh_token"),
("client_id", <client_id>),
("client_secret", <client_secret>),
("refresh_token", <refresh_token>),
]
files = []
edata = EncodeMultiPart(headers, files, file_type='text/plain')
headers = {}
request = Request(url, headers=headers)
request.add_data(edata)
request.add_header('Content-Length', str(len(edata)))
request.add_header('Content-Type', 'multipart/form-data;boundary=%s' % BOUNDARY)
try:
response = urlopen(request).read()
response = json.decode(response)
except URLError, e:
...
EncodeMultipart function is taken from here:
https://developers.google.com/cloud-print/docs/pythonCode
Be sure to use the same BOUNDARY

Looking at the source code for OAuthRemoteApp. The constructor does not take a keyword argument called refresh_token. It does however take an argument called access_token_params which is an optional dictionary of parameters to forward to the access token url.
Since the url is the same, but the grant type is different. I imagine a call like this should work:
google = oauthManager.remote_app(
'google',
# also the consumer_key, secret, request_token_params, etc..
grant_type='refresh_token',
access_token_params = {
refresh_token=u'1/xK_ZIeFn9quwvk4t5VRtE2oYe5yxkRDbP9BQ99NcJT0'
}
)

flask-oauthlib.contrib contains an parameter named auto_refresh_url / refresh_token_url in the remote_app which does exactely what you wanted to wanted to do. An example how to use it looks like this:
app= oauth.remote_app(
[...]
refresh_token_url='https://www.douban.com/service/auth2/token',
authorization_url='https://www.douban.com/service/auth2/auth',
[...]
)
However I did not manage to get it running this way. Nevertheless this is possible without the contrib package. My solution was to catch 401 API calls and redirect to a refresh page if a refresh_token is available.
My code for the refresh endpoint looks as follows:
#app.route('/refresh/')
def refresh():
data = {}
data['grant_type'] = 'refresh_token'
data['refresh_token'] = session['refresh_token'][0]
data['client_id'] = CLIENT_ID
data['client_secret'] = CLIENT_SECRET
# make custom POST request to get the new token pair
resp = remote.post(remote.access_token_url, data=data)
# checks the response status and parses the new tokens
# if refresh failed will redirect to login
parse_authorized_response(resp)
return redirect('/')
def parse_authorized_response(resp):
if resp is None:
return 'Access denied: reason=%s error=%s' % (
request.args['error_reason'],
request.args['error_description']
)
if isinstance(resp, dict):
session['access_token'] = (resp['access_token'], '')
session['refresh_token'] = (resp['refresh_token'], '')
elif isinstance(resp, OAuthResponse):
print(resp.status)
if resp.status != 200:
session['access_token'] = None
session['refresh_token'] = None
return redirect(url_for('login'))
else:
session['access_token'] = (resp.data['access_token'], '')
session['refresh_token'] = (resp.data['refresh_token'], '')
else:
raise Exception()
return redirect('/')
Hope this will help. The code can be enhanced of course and there surely is a more elegant way than catching 401ers but it's a start ;)
One other thing: Do not store the tokens in the Flask Session Cookie. Rather use Server Side Sessions from "Flask Session" which I did in my code!

This is how i got my new access token.
from urllib2 import Request, urlopen, URLError
import json
import mimetools
BOUNDARY = mimetools.choose_boundary()
CRLF = '\r\n'
def EncodeMultiPart(fields, files, file_type='application/xml'):
"""Encodes list of parameters and files for HTTP multipart format.
Args:
fields: list of tuples containing name and value of parameters.
files: list of tuples containing param name, filename, and file contents.
file_type: string if file type different than application/xml.
Returns:
A string to be sent as data for the HTTP post request.
"""
lines = []
for (key, value) in fields:
lines.append('--' + BOUNDARY)
lines.append('Content-Disposition: form-data; name="%s"' % key)
lines.append('') # blank line
lines.append(value)
for (key, filename, value) in files:
lines.append('--' + BOUNDARY)
lines.append(
'Content-Disposition: form-data; name="%s"; filename="%s"'
% (key, filename))
lines.append('Content-Type: %s' % file_type)
lines.append('') # blank line
lines.append(value)
lines.append('--' + BOUNDARY + '--')
lines.append('') # blank line
return CRLF.join(lines)
def refresh_token():
url = "https://oauth2.googleapis.com/token"
headers = [
("grant_type", "refresh_token"),
("client_id", "xxxxxx"),
("client_secret", "xxxxxx"),
("refresh_token", "xxxxx"),
]
files = []
edata = EncodeMultiPart(headers, files, file_type='text/plain')
#print(EncodeMultiPart(headers, files, file_type='text/plain'))
headers = {}
request = Request(url, headers=headers)
request.add_data(edata)
request.add_header('Content-Length', str(len(edata)))
request.add_header('Content-Type', 'multipart/form-data;boundary=%s' % BOUNDARY)
response = urlopen(request).read()
print(response)
refresh_token()
#response = json.decode(response)
#print(refresh_token())

With your refresh_token, you can get a new access_token like:
from google.oauth2.credentials import Credentials
from google.auth.transport import requests
creds = {"refresh_token": "<goes here>",
"token_uri": "https://accounts.google.com/o/oauth2/token",
"client_id": "<YOUR_CLIENT_ID>.apps.googleusercontent.com",
"client_secret": "<goes here>",
"scopes": ["https://www.googleapis.com/auth/userinfo.email"]}
cred = Credentials.from_authorized_user_info(creds)
cred.refresh(requests.Request())
my_new_access_token = cred.token

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to download a file with python-google-api - python

How would I download a file using the GoogleAPI? Here is what I have so far: CLIENT_ID = '255556' CLIENT_SECRET = 'y8sR1' DOCUMENT_ID = 'a123' service=build('drive', 'v2') # How to do the following line? service.get_file(CLIENT_ID, CLIENT_SECRET, DOCUMENT_ID)

Related

How do I directly save images from a url to my aws bucket (python / boto3)? [duplicate]

How can I upload the file to a desired folder in sharepoint using Python?

'Response headers must contain header', u'location'

Flask proxy response for file download

How to use a refresh_token to get a new access_token (using Flask-OAuthLib)?

Categories

Resources