Authenticate to Google Drive and download spreadsheet with Python urllib2/requests - python

I would like to download a document I have in my Google Drive authenticating to Google (I only want certain users to be able to access it and do not want to publish it on the web).
I have tried using requests but apparently I am doing something wrong.
From a browser I can download my document going to the address
https://docs.google.com/spreadsheets/d/<document key>/export?format=xls.
So in my python script I do the following:
import os
import requests
import shutil
from requests.auth import HTTPBasicAuth
remote = "https://docs.google.com/spreadsheets/d/<document key>/export?format=xls"
username = os.environ['GOOGLEUSERNAME']
password = os.environ['GOOGLEPASSWORD']
r = requests.get(remote, auth=HTTPBasicAuth(username,password))
if r.status_code == 200:
with open("document.xls","wb") as f:
shutil.copyfileobj(r.raw, f)
however the resulting document.xls is empty.
What am I doing wrong?

It might actually be possible what you are trying to do, but here are some reasons why it will be non-trivial(by no means a complete list):
Google is usually blocking user-agents that are non-browsers(like your Python script) for browser intended content (for security reasons); you would have to spoof it, which is actually easy
Multi-factor authentication - you would have to turn that off (easy, but you open yourself up for being hacked...)
Session-cookie - aka security cookie; (not so easy to get ahold of)
What you should do instead
Use the official google-drive API. Also, the Python client library has a nice tutorial and this page describes how to download files from google-drive.
If you want to write even less code, then libraries like PyDrive will make your live even easier.
Hope this helps!

I might have a simple solution for you, depending on what exactly the auth requirements are. You are saying
I only want certain users to be able to access it and do not want to
publish it on the web
From this statement alone, it may be sufficient for you to create a "secret" link for your document, and share this among your users. You can then easily retrieve this document automatically, for instance with wget, and specify the format, e.g. csv:
wget -O data.csv "https://docs.google.com/spreadsheets/d/***SHARED-SECRET***/export?format=csv"
Or, in Python (2):
import urllib2
from cookielib import CookieJar
spreadsheet_url = "https://docs.google.com/spreadsheets/d/***SHARED-SECRET***/export?format=csv"
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(CookieJar()))
response = opener.open(spreadsheet_url)
with open("data.csv", "wb") as f:
f.write(response.read())
I am actually using that in production, it works reliably, without faking the user agent.

Related

How to post request to API using only code?

I am developing a DAG to be scheduled on Apache Airflow which main porpuse will be to post survey data (on json format) to an API and then getting a response (the answers to the surveys). Since this whole process is going to be automated, every part of it has to be programmed in the DAG, so I canĀ“t use Postman or any similar app (unless there is a way to automate their usage, but I don't know if this is possible).
I was thinking of using the requests library for Python, and the function I've written for posting the json to the API looks like this:
def postFileToAPI(**context):
print('uploadFileToAPI() ------ ')
json_file = context['ti'].xcom_pull(task_ids='toJson') ## this pulls the json file from a previous task
print('--------------- Posting survey request to API')
r = requests.post('https://[request]', data = json_file)
(I haven't finished defining the http link for the request because my source data is incomplete.)
However, since this is my frst time working with APIs and the requests library, I don't know if this is enough. For example, I'm unsure if I need to provide a token from the API to perform the request.
I also don't know if there are other libraries that are better suited for this or that could be a good support.
In short: I don't know if what I'm doing will work as intended, what other information I need t provide my DAG or if there are any libraries to make my work easier.
The Python requests package that you're using is all you need, except if you're making a request that needs extra authorisation - then you should also import for example requests_jwt (then from requests_jwt import JWTAuth) if you're using JSON web tokens, or whatever relevant requests package corresponds for your authorisation style.
You make POST and GET requests and all individual requests separately.
Include the URL and data arguments as you have done and that should work!
You may also need headers and/or auth arguments to get through security,
eg for the GitLab api for a private repository you would include these extra arguments, where GITLAB_TOKEN is a GitLab web token.
```headers={'PRIVATE-TOKEN': GITLAB_TOKEN},
auth=JWTAuth(GITLAB_TOKEN)```
If you just try it it should work, if it doesn't work then test the API with curl requests directly in the Terminal, or let us know :)

Web tfs and Python3 integration

What is the right way to insert data from python into web tfs?
I have results from Jenkins Automation for specific suite and test cases.I have extracted the results into a python script as a Json. I would like to change the outcome of the same testCases in Web tfs. Please advise
Not sure if totally get your point. Seems you just want to update TFS test case result.
You could use Rest API to handle this. It will update test results in a test run.
PATCH https://dev.azure.com/{organization}/{project}/_apis/test/Runs/{runId}/results?api-version=5.1
Since you are using Python, it's able to use Python Script to Access Team Foundation Server (TFS) Rest API.
First you need to use Python to connect your TFS server. TFS uses NTLM authentication protocol , you should use HTTP NTLM authentication using the requests library.
Code Snippet:
import requests
from requests_ntlm import HttpNtlmAuth
username = '<DOMAIN>\\<UserName>'
password = '<Password>'
tfsApi = 'https://{myserver}/tfs/collectionName/_apis/projects?api-version=2.0'
tfsResponse = requests.get(tfsApi,auth=HttpNtlmAuth(username,password))
if(tfsResponse.ok):
tfsResponse = tfsResponse.json()
print(tfsResponse)
else:
tfsResponse.raise_for_status()
More details take a look at this blog.

Authentication to Google Cloud Python API Library stopped working

I have problems with the authentication in the Python Library of Google Cloud API.
At first it worked for some days without problem, but suddenly the API calls are not showing up in the API Overview of the Google CloudPlatform.
I created a service account and stored the json file locally. Then I set the environment variable GCLOUD_PROJECT to the project ID and GOOGLE_APPLICATION_CREDENTIALS to the path of the json file.
from google.cloud import speech
client = speech.Client()
print(client._credentials.service_account_email)
prints the correct service account email.
The following code transcribes the audio_file successfully, but the Dashboard for my Google Cloud project doesn't show anything for the activated Speech API Graph.
import io
with io.open(audio_file, 'rb') as f:
audio = client.sample(f.read(), source_uri=None, sample_rate=48000, encoding=speech.encoding.Encoding.FLAC)
alternatives = audio.sync_recognize(language_code='de-DE')
At some point the code also ran in some errors, regarding the usage limit. I guess due to the unsuccessful authentication, the free/limited option is used somehow.
I also tried the alternative option for authentication by installing the Google Cloud SDK and gcloud auth application-default login, but without success.
I have no idea where to start troubleshooting the problem.
Any help is appreciated!
(My system is running Windows 7 with Anaconda)
EDIT:
The error count (Fehler) is increasing with calls to the API. How can I get detailed information about the error?!
Make sure you are using an absolute path when setting the GOOGLE_APPLICATION_CREDENTIALS environment variable. Also, you might want to try inspecting the access token using OAuth2 tokeninfo and make sure it has "scope": "https://www.googleapis.com/auth/cloud-platform" in its response.
Sometimes you will get different error information if you initialize the client with GRPC enabled:
0.24.0:
speech_client = speech.Client(_use_grpc=True)
0.23.0:
speech_client = speech.Client(use_gax=True)
Usually it's an encoding issue, can you try with the sample audio or try generating LINEAR16 samples using something like the Unix rec tool:
rec --channels=1 --bits=16 --rate=44100 audio.wav trim 0 5
...
with io.open(speech_file, 'rb') as audio_file:
content = audio_file.read()
audio_sample = speech_client.sample(
content,
source_uri=None,
encoding='LINEAR16',
sample_rate=44100)
Other notes:
Sync Recognize is limited to 60 seconds of audio, you must use async for longer audio
If you haven't already, set up billing for your account
With regards to the usage problem, the issue is in fact that when you use the new google-cloud library to access ML APIs, it seems everyone authenticates to a project shared by everyone (hence it says you've used up your limit even though you've not used anything). To check and confirm this, you can call an ML API that you have not enabled by using the python client library, which will give you a result even though it shouldn't. This problem persists to other language client libraries and OS, so I suspect it's an issue with their grpc.
Because of this, to ensure consistency I always use the older googleapiclient that uses my API key. Here is an example to use the translate API:
from googleapiclient import discovery
service = discovery.build('translate', 'v2', developerKey='')
service_request = service.translations().list(q='hello world', target='zh')
result = service_request.execute()
print(result)
For the speech API, it's something along the lines of:
from googleapiclient import discovery
service = discovery.build('speech', 'v1beta1', developerKey='')
service_request = service.speech().syncrecognize()
result = service_request.execute()
print(result)
You can get the list of the discovery APIs at https://developers.google.com/api-client-library/python/apis/ with the speech one located in https://developers.google.com/resources/api-libraries/documentation/speech/v1beta1/python/latest/.
One of the other benefits of using the discovery library is that you get a lot more options compared to the current library, although often times it's a bit more of a pain to implement.

Google Appengine: Requests Alternative

I have a non-GAE application/request-handler that uses the Python requests module in a to post an uploaded imaged via a POST request, as binary:
headers = {"MyAuth" : "xyz"}
r = requests.post(base_uri, data=open('0.jpg')), headers=headers)
The user uploads an image, the uploaded image is saved locally, opened for reading, then sent to a remote classifier pipeline via post request - this returns some JSON regarding the image features, which can then be returned to the user.
I need to implement this behaviour in a GAE app, but know that GAE has no traditional file system, so I will have to use StringIO:
data = ... #some jpg => str
headers = {"MyAuth" : "xyz"}
r = requests.post(base_uri, data=StringIO.StringIO(data), headers=headers)
How could I completely replace the requests module in this example in a GAE friendly way?
Many thanks.
Commonly used module for making HTTP requests on app engine is urlfetch, it is available in the default runtime via google.appengine.api.urlfetch. Supposedly urllib2 and/or urllib3 are also options, but I have not used those myself so I can't say for sure.
You can also install requests in your app engine directory and upload it with the project, but I find that a bit of a hassle, since requests has its own dependencies that you will need to include as well.
Also see Using the Requests python library in Google App Engine
Although probably not the best solution to this problem, I managed to get requests 2.3.0 to work in the GAE project with:
pip install --target myproject/externals/ requests==2.3.0
I can now use requests as I would normally.

Loading cookies in python

I am a novice programmer attempting to access google insights using python. I can access sites which dont require cookies fine, but i cant seem to properly pass the cookies along. The cookines file was exported from mozilla firefox, is in the Z: drive which is also where im running python from.
Im also pretty sure my code for saving the file could be better done than reading and writing but I dont know how to do that either. Any helpo would be appreciated.
import urllib2
import cookielib
import os
url = "http://www.google.com/insights/search/overviewReport?q=eagles%2Ccsco&geo=US&cmpt=q&content=1&export=2"
cj = cookielib.MozillaCookieJar()
cj.load('cookies6.txt')
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
file = opener.open(url)
output = open('test2.csv','wb')
output.write(file.read())
output.close()
I haven't tested your code however:
As far as I can tell there seems to be nothing wrong with your code
I've tried the url you're searching and had no problems downloading the csv without any cookies
In my previous experience with google, you might be looking at the problem the wrong way, it is not that you don't have the right cookies but that google automatically blocks requests from bots. If this is the case you must replace the user agent http header to mimic an actual browser. Beware however that this is against googles terms of service and if you make too many requests per minute google will block all requests from your ip for about 8h.

Categories

Resources