using Google BigQuery through Python script - python

I want to make some very easy tasks on BigQuery via a python script. I found this package which does not work well. Indeed, when I try this code:
from bigquery import get_client
project_id = 'txxxxxxxxxxxxxxxxxx9'
# Service account email address as listed in the Google Developers Console.
service_account = '7xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com'
# PKCS12 or PEM key provided by Google.
key = '/home/fxxxxxxxxxxxx/Dropbox/access_keys/google_storage/xxxxxxxxxxxxxxxxxxxxx.pem'
client = get_client(project_id, service_account=service_account, private_key_file=key, readonly=True)
# Submit an async query.
results = client.get_table_schema('newdataset', 'newtable2')
print('results')
I get this error:
/home/xxxxxx/anaconda3/envs/snakes/bin/python2.7 /home/xxxxxx/Dropbox/Prog/bigQuery_daily_import/src/main.py
Traceback (most recent call last):
File "/home/xxxxxx/Dropbox/Prog/bigQuery_daily_import/src/main.py", line 9, in <module>
client = get_client(project_id, service_account=service_account, private_key_file=key, readonly=True)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/bigquery/client.py", line 83, in get_client
readonly=readonly)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/bigquery/client.py", line 101, in _get_bq_service
service = build('bigquery', 'v2', http=http)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/util.py", line 142, in positional_wrapper
return wrapped(*args, **kwargs)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/googleapiclient/discovery.py", line 196, in build
cache)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/googleapiclient/discovery.py", line 242, in _retrieve_discovery_doc
resp, content = http.request(actual_url)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/client.py", line 565, in new_request
self._refresh(request_orig)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/client.py", line 835, in _refresh
self._do_refresh_request(http_request)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/client.py", line 862, in _do_refresh_request
body = self._generate_refresh_request_body()
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/client.py", line 1541, in _generate_refresh_request_body
assertion = self._generate_assertion()
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/client.py", line 1670, in _generate_assertion
private_key, self.private_key_password), payload)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/_pycrypto_crypt.py", line 121, in from_string
pkey = RSA.importKey(parsed_pem_key)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/Crypto/PublicKey/RSA.py", line 665, in importKey
return self._importKeyDER(der)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/Crypto/PublicKey/RSA.py", line 588, in _importKeyDER
raise ValueError("RSA key format is not supported")
ValueError: RSA key format is not supported
Process finished with exit code 1
My question: is there a tutorial in python which shows how to communicate easily with BigQuery: importing a dataset from google storage or S3, querying something, exporting the result to google storage.

A lot depends on your environment, and once you've figure that out everything should be super simple. I see the only problem on the error log you pasted is figuring out authentication.
Python pandas has had support for BigQuery for a while:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.gbq.read_gbq.html
And I did a video with the creators of the module:
https://www.youtube.com/watch?v=gLeTDUMb7HY
Now, the simplest and fastest way these days to launch an Jupyter notebook with all of the Google Cloud goodies you mention is our new Google Datalab project:
https://cloud.google.com/datalab/
The only Datalab caveat is that it works on cloud servers, but if you want a fully managed Jupyter/IPython environment, totally secure, persistent, and ready to handle BigQuery, storage, etc... try it out.
Meanwhile, if you are writing a web application look at how other web applications solve this task.
For example, re:dash code to connect to BigQuery:
https://github.com/EverythingMe/redash/blob/master/redash/query_runner/big_query.py

Related

google.auth.exceptions.RefresError: 'No access token in response.'

Environment details
OS type and version:
Python version: 3.9.0
pip version: 22.0.4
google-api-python-client version: 2.48.0
Description
Hi, I'm running into an error when trying to fetch the Google Play Console reports of our mobile apps (such as installations, errors etc.). I first tried with this manual but it seems to be outdated and didn't work. So after some research I changed it similar to this one, that it fits to the current google api (see code snippet below).
Steps I have done:
Created a project on "console.cloud.google.com"
Created the service account
Created the json key file
Invited the service account on play.google.com/console and gave him full admin rights (normally "see app information and download bulk reports" should be enough)
Added the role "Storage Object Viewer" to the Service account in https://console.cloud.google.com/iam-admin/iam?authuser=1&project=myproject
waited for 24h to make sure there are no errors because of syncs or so.
(I anonymized some of the values below).
Code example
from googleapiclient.discovery import build
from google.oauth2 import service_account
scopes = ['https://www.googleapis.com/auth/devstorage.read_only','https://www.googleapis.com/auth/cloud-platform.read_only']
key_file_location = 'files/access_token/mykeyfile.json'
cloud_storage_bucket = r'pubsite_prod_rev_00123456789'
report_to_download = 'installs/installs_com.my.app_202201_country.csv'
creds = service_account.Credentials.from_service_account_file(key_file_location,scopes=scopes)
service = build('storage','v1', credentials=creds)
print(service.objects().get(bucket = cloud_storage_bucket, object= report_to_download).execute())
Stack trace
Traceback (most recent call last):
File "C:\Users\myuser\project\z_10_ext_google_play_store.py", line 46, in <module>
print(service.objects().get(bucket = cloud_storage_bucket, object= report_to_download).execute())
File "D:\Programs\Python\lib\site-packages\googleapiclient\_helpers.py", line 130, in positional_wrapper
return wrapped(*args, **kwargs)
File "D:\Programs\Python\lib\site-packages\googleapiclient\http.py", line 923, in execute
resp, content = _retry_request(
File "D:\Programs\Python\lib\site-packages\googleapiclient\http.py", line 191, in _retry_request
resp, content = http.request(uri, method, *args, **kwargs)
File "D:\Programs\Python\lib\site-packages\google_auth_httplib2.py", line 209, in request
self.credentials.before_request(self._request, method, uri, request_headers)
File "D:\Programs\Python\lib\site-packages\google\auth\credentials.py", line 133, in before_request
self.refresh(request)
File "D:\Programs\Python\lib\site-packages\google\oauth2\service_account.py", line 410, in refresh
access_token, expiry, _ = _client.jwt_grant(
File "D:\Programs\Python\lib\site-packages\google\oauth2\_client.py", line 199, in jwt_grant
six.raise_from(new_exc, caught_exc)
File "<string>", line 3, in raise_from
google.auth.exceptions.RefreshError: ('No access token in response.', {'id_token': 'eyJ...'})
I hope that I provided enough information and I'm sorry in advance if I made a stupid mistake.

Authentification to kubernetes api via Azure Active Directory (AKS)

I would like to use python kubernetes-client to connect to my AKS cluster api.
To do that I try to use the example give by kubernetes:
config.load_kube_config()
v1 = client.CoreV1Api()
print("Listing pods with their IPs:")
ret = v1.list_pod_for_all_namespaces(watch=False)
for i in ret.items:
print("%s\t%s\t%s" % (i.status.pod_ip, i.metadata.namespace, i.metadata.name))
It is supposed to load my local kubeconfig and get a pods list but I get the following error:
Traceback (most recent call last): File "test.py", line 4, in
config.load_kube_config() File "/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/kubernetes/config/kube_config.py",
line 661, in load_kube_config
loader.load_and_set(config) File "/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/kubernetes/config/kube_config.py",
line 469, in load_and_set
self._load_authentication() File "/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/kubernetes/config/kube_config.py",
line 203, in _load_authentication
if self._load_auth_provider_token(): File "/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/kubernetes/config/kube_config.py",
line 221, in _load_auth_provider_token
return self._load_azure_token(provider) File "/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/kubernetes/config/kube_config.py",
line 233, in _load_azure_token
self._refresh_azure_token(provider['config']) File "/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/kubernetes/config/kube_config.py",
line 253, in _refresh_azure_token
refresh_token, client_id, '00000002-0000-0000-c000-000000000000') File
"/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/adal/authentication_context.py",
line 236, in acquire_token_with_refresh_token
return self._acquire_token(token_func) File "/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/adal/authentication_context.py",
line 128, in _acquire_token
return token_func(self) File "/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/adal/authentication_context.py",
line 234, in token_func
return token_request.get_token_with_refresh_token(refresh_token, client_secret) File
"/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/adal/token_request.py",
line 343, in get_token_with_refresh_token
return self._get_token_with_refresh_token(refresh_token, None, client_secret) File
"/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/adal/token_request.py",
line 340, in _get_token_with_refresh_token
return self._oauth_get_token(oauth_parameters) File "/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/adal/token_request.py",
line 112, in _oauth_get_token
return client.get_token(oauth_parameters) File "/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/adal/oauth2_client.py",
line 291, in get_token
raise AdalError(return_error_string, error_response) adal.adal_error.AdalError: Get Token request returned http error: 400
and server response:
{"error":"invalid_grant","error_description":"AADSTS65001: The user or
administrator has not consented to use the application with ID
'' named 'Kubernetes AD Client
'. Send an interactive authorization request for this user and
resource.\r\nTrace ID:
\r\nCorrelation ID:
\r\nTimestamp: 2019-10-14
12:32:35Z","error_codes":[65001],"timestamp":"2019-10-14
12:32:35Z","trace_id":"","correlation_id":"","suberror":"consent_required"}
I really don't understand why it doesn't work.
When I use kubectl, all work fine.
I read some docs but I'm not sure to understand the adal error.
Thanks for your help
Login as a tenant admin to https://portal.azure.com
Open the registration for your app in the
Go to Settings then Required Permissions
Press the Grant Permissions button
If you are not a tenant admin, you cannot give admin consent
From https://github.com/Azure-Samples/active-directory-angularjs-singlepageapp-dotnet-webapi/issues/19
This is good post where you can find snippet to authenticate to AKS:
from azure.identity import AzureCliCredential
from azure.mgmt.resource import ResourceManagementClient
from azure.mgmt.containerservice import ContainerServiceClient
from azure.mgmt.containerservice.models import (ManagedClusterAgentPoolProfile,
ManagedCluster)
credential = AzureCliCredential()
subscription_id = "XXXXX"
resource_group= 'MY-RG'
resouce_client=ResourceManagementClient(credential,subscription_id)
container_client=ContainerServiceClient(credential,subscription_id)
resouce_list=resouce_client.resources.list_by_resource_group(resource_group)
Note: You need to install respective Az Python SKD libraries.

client.get_bucket() fails, but only from cloud dataflow (compute engine)

Application has been working normally, now on a re-deploy, google storage is giving strange errors.
MissingSchema: Invalid URL 'None/storage/v1/b/my-bucket-name?projection=noAcl': No schema supplied. Perhaps you meant http://None/storage/v1/b/my-bucket-name?projection=noAcl?
File "/usr/local/lib/python2.7/dist-packages/lib/file_store.py", line 11, in __init__
self.bucket = self.client.get_bucket(parts[0])
File "/usr/local/lib/python2.7/dist-packages/google/cloud/storage/client.py", line 301, in get_bucket
bucket.reload(client=self)
File "/usr/local/lib/python2.7/dist-packages/google/cloud/storage/_helpers.py", line 130, in reload
_target_object=self,
File "/usr/local/lib/python2.7/dist-packages/google/cloud/_http.py", line 392, in api_request
target_object=_target_object,
File "/usr/local/lib/python2.7/dist-packages/google/cloud/_http.py", line 269, in _make_request
return self._do_request(method, url, headers, data, target_object)
File "/usr/local/lib/python2.7/dist-packages/google/cloud/_http.py", line 298, in _do_request
return self.http.request(url=url, method=method, headers=headers, data=data)
File "/usr/local/lib/python2.7/dist-packages/google/auth/transport/requests.py", line 208, in request
method, url, data=data, headers=request_headers, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 519, in request
prep = self.prepare_request(req)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 462, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 313, in prepare
self.prepare_url(url, params)
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 387, in prepare_url
raise MissingSchema(error)
MissingSchema: Invalid URL 'None/storage/v1/b/my-bucket-name?projection=noAcl': No schema supplied. Perhaps you meant http://None/storage/v1/b/my-bucket-name?projection=noAcl? [while running 'generatedPtransform-51']
The code causing the error, I can run this locally using the same service account and it works, no error. I am using $env:GOOGLE_APPLICATION_CREDENTIALS to export my service account credentials at deploy time. All other services are working normally.
# My test is:
# fs = FileStore("gs://my-bucket-name/models/", "development", "general")
class FileStore():
# modelPath - must be a gs:// style google storage resource path containing everything but the file extension
def __init__(self, modelPath, env, modelName):
from google.cloud import storage
parts = modelPath[5:].split('/', 1)
self.client = storage.Client()
self.bucket = self.client.get_bucket(parts[0]) # <- error here
Why would google core client fail to build a URL? Based on 'None/storage/v1/b/my-bucket-name?projection=noAcl', the missing part of the URL should be something like "https://www.googleapis.com".
This error is apparently caused by a mismatch between google_cloud_storage and google_cloud_core. I had specified google_cloud_core >= 1.0.3 in my setup.py but when I looked on the docker image on the compute VM I found it had an earlier version.
After rebuilding my venv from setup.py I had to also run:
C:\Python27\python.exe -m pipenv install google-cloud-core>=1.0.3 --skip-lock
Then I was able to deploy and the application started working again.

Youtube Api: Python Search is not working throws Application Default Credentials are not available

Trying to search youtube videos using this code: https://github.com/youtube/api-samples/blob/master/python/search.py
But whatever I try I get below error even though I provided the api key:
Traceback (most recent call last):
File "search.py", line 56, in <module>
youtube_search(args)
File "search.py", line 18, in youtube_search
developerKey=DEVELOPER_KEY)
File "C:\Python27\Scripts\tvapp_env\lib\site-packages\oauth2client\_helpers.py", line 133, in positional_wrapper
return wrapped(*args, **kwargs)
File "C:\Python27\Scripts\tvapp_env\lib\site-packages\googleapiclient\discovery.py", line 226, in build
credentials=credentials)
File "C:\Python27\Scripts\tvapp_env\lib\site-packages\oauth2client\_helpers.py", line 133, in positional_wrapper
return wrapped(*args, **kwargs)
File "C:\Python27\Scripts\tvapp_env\lib\site-packages\googleapiclient\discovery.py", line 358, in build_from_document
credentials = _auth.default_credentials()
File "C:\Python27\Scripts\tvapp_env\lib\site-packages\googleapiclient\_auth.py", line 40, in default_credentials
return oauth2client.client.GoogleCredentials.get_application_default()
File "C:\Python27\Scripts\tvapp_env\lib\site-packages\oauth2client\client.py", line 1264, in get_application_default
return GoogleCredentials._get_implicit_credentials()
File "C:\Python27\Scripts\tvapp_env\lib\site-packages\oauth2client\client.py", line 1254, in _get_implicit_credentials
raise ApplicationDefaultCredentialsError(ADC_HELP_MSG)
oauth2client.client.ApplicationDefaultCredentialsError: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.
Please help, im running this from windows machine.
Or is there better code to make youtube search.
Thanks
Seems like you are not running the app in either Google app engine or in Google compute engine. So there are not going to be any default credentials to use so you have to download the credentials from the google developer console first and use those credentials in your application. Please refer this document for further details.

Using a service account with boto to access Cloud Storage in GAE with gcs_oauth2_boto_plugin

I am wondering if anyone knows a way to use a service account to authenticate if I want to access data in Cloud Storage by:
1. Using boto library (and gcs_oauth2_boto_plugin)
2. Running in Google App Engine (GAE)
Following https://developers.google.com/storage/docs/gspythonlibrary I am using boto and gcs_oauth2_boto_plugin to authenticate and perform actions against Cloud Storage (upload/download files). I am using a service account to authenticate so that we don't have to authenticate with a Google account periodically (the thought being that if we run this in GCE, it'll run with the GCE service account -- haven't actually done that yet). Locally, I've set up my boto config file to use the service account and point to a p12 key file. This runs fine locally.
I would like to use the same code to interact with Cloud Storage from within Google App Engine (GAE). We are running a light weight ETL process that transforms and loads the data into Big Query. We want to run this code in App Engine task queue (the task is getting triggered by an Object Change Notification from Cloud Storage).
Since we're currently relying on the boto config (~/.boto), I adapted http://thurloat.com/2010/06/07/google-storage-and-app-engine to put the relevant config items for a service account.
When I finally run the code from App Engine (dev_appserver.py), I get the below stack trace:
Traceback (most recent call last):
File "/home/some-user/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.1/webapp2.py", line 1536, in __call__
rv = self.handle_exception(request, response, e)
File "/home/some-user/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.1/webapp2.py", line 1530, in __call__
rv = self.router.dispatch(request, response)
File "/home/some-user/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.1/webapp2.py", line 1278, in default_dispatcher
return route.handler_adapter(request, response)
File "/home/some-user/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.1/webapp2.py", line 1102, in __call__
return handler.dispatch()
File "/home/some-user/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.1/webapp2.py", line 572, in dispatch
return self.handle_exception(e, self.app.debug)
File "/home/some-user/google-cloud-sdk/platform/google_appengine/lib/webapp2-2.5.1/webapp2.py", line 570, in dispatch
return method(*args, **kwargs)
File "/home/some-user/dev/myApp/main.py", line 247, in post
gs.download(fname, fp)
File "/home/some-user/dev/myApp/cloudstorage.py", line 107, in download
bytes = src_uri.get_key().get_contents_to_file(fp)
File "/home/some-user/dev/myApp/boto/storage_uri.py", line 336, in get_key
bucket = self.get_bucket(validate, headers)
File "/home/some-user/dev/myApp/boto/storage_uri.py", line 181, in get_bucket
conn = self.connect()
File "/home/some-user/dev/myApp/boto/storage_uri.py", line 140, in connect
**connection_args)
File "/home/some-user/dev/myApp/boto/gs/connection.py", line 47, in __init__
suppress_consec_slashes=suppress_consec_slashes)
File "/home/some-user/dev/myApp/boto/s3/connection.py", line 190, in __init__
validate_certs=validate_certs, profile_name=profile_name)
File "/home/some-user/dev/myApp/boto/connection.py", line 568, in __init__
host, config, self.provider, self._required_auth_capability())
File "/home/some-user/dev/myApp/boto/auth.py", line 929, in get_auth_handler
ready_handlers.append(handler(host, config, provider))
File "/home/some-user/dev/myApp/gcs_oauth2_boto_plugin/oauth2_plugin.py", line 56, in __init__
cred_type=oauth2_client.CredTypes.OAUTH2_SERVICE_ACCOUNT)
File "/home/some-user/dev/myApp/gcs_oauth2_boto_plugin/oauth2_helper.py", line 48, in OAuth2ClientFromBotoConfig
token_cache = oauth2_client.FileSystemTokenCache()
File "/home/some-user/dev/myApp/gcs_oauth2_boto_plugin/oauth2_client.py", line 175, in __init__
tempfile.gettempdir(), 'oauth2_client-tokencache.%(uid)s.%(key)s')
File "/home/some-user/google-cloud-sdk/platform/google_appengine/google/appengine/dist/tempfile.py", line 61, in PlaceHolder
raise NotImplementedError("Only tempfile.TemporaryFile is available for use")
NotImplementedError: Only tempfile.TemporaryFile is available for use
Looks like the problem is just with gcs_oauth2_boto_plugin trying to use a temporary directory when caching the oauth credentials (App Engine only supports tempfile.TemporaryFile).
Rather than try and patch gcs_oauth2_boto_plugin, is there potentially another solution? Can we use a service account with gcs_oauth2_boto_plugin/boto on App Engine to access Cloud Storage resources?
Or, am I using the wrong authentication method here?
This doesn't quite answer the question directly, but instead of using boto and gcs_oauth2_boto_plugin, I am using the "Google Cloud Storage
Python Client Library", GoogleAppEngineCloudStorageClient from pip.
https://developers.google.com/appengine/docs/python/googlecloudstorageclient/

Categories

Resources