I'm trying to access BigQuery API for the first time using Python and am following the tutorial guide here. My ultimate goal is to create datasets, but I seem to be having trouble with basic public data access.
I've created a service account with the following:
BigQuery Admin
BigQuery Data Owner
Using this service account, I generated a private key (.json) and kept it nicely tucked away on my local machine, and am referencing it in my gcp_key_path file path.
However, I run into the following jwt issue when running the lines below. I saw a GitHub issue on https://github.com/googleapis/google-cloud-python/issues/8736#event-2510884623 suggesting that the jwt token expiration may be an issue. I'm not sure how to check for this nor if this is automatically set when using a service account to make this request. I checked my sys time (Mac OS Catalina v10.15.6) and nothing seems to be obviously off.
If someone has run into this before, would greatly appreciate a layman's explanation on what is going on, and guidance (or pointer to docs) that can help me learn how to fix this issue.
import os
from google.cloud import bigquery
table_id = "[project_name].[file_name]"
file_path = '/path/to/my/data_file.json'
gcp_key_path = '/path/to/my/service/account/private/key.json'
# set the environment variable explicitly
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = gcp_key_path
# Construct a BigQuery client object.
client = bigquery.Client.from_service_account_json(gcp_key_path)
dataset_ref = client.dataset("hacker_news", project="bigquery-public-data")
dataset = client.get_dataset(dataset_ref)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 674, in get_dataset
timeout=timeout,
File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 637, in _call_api
return call()
File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 286, in retry_wrapped_func
on_error=on_error,
File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 184, in retry_target
return target()
File "/usr/local/lib/python3.7/site-packages/google/cloud/_http.py", line 431, in api_request
timeout=timeout,
File "/usr/local/lib/python3.7/site-packages/google/cloud/_http.py", line 289, in _make_request
method, url, headers, data, target_object, timeout=timeout
File "/usr/local/lib/python3.7/site-packages/google/cloud/_http.py", line 327, in _do_request
url=url, method=method, headers=headers, data=data, timeout=timeout
File "/usr/local/lib/python3.7/site-packages/google/auth/transport/requests.py", line 460, in request
self.credentials.before_request(auth_request, method, url, request_headers)
File "/usr/local/lib/python3.7/site-packages/google/auth/credentials.py", line 133, in before_request
self.refresh(request)
File "/usr/local/lib/python3.7/site-packages/google/oauth2/service_account.py", line 361, in refresh
access_token, expiry, _ = _client.jwt_grant(request, self._token_uri, assertion)
File "/usr/local/lib/python3.7/site-packages/google/oauth2/_client.py", line 153, in jwt_grant
response_data = _token_endpoint_request(request, token_uri, body)
File "/usr/local/lib/python3.7/site-packages/google/oauth2/_client.py", line 124, in _token_endpoint_request
_handle_error_response(response_body)
File "/usr/local/lib/python3.7/site-packages/google/oauth2/_client.py", line 60, in _handle_error_response
raise exceptions.RefreshError(error_details, response_body)
google.auth.exceptions.RefreshError: ('invalid_grant: Invalid JWT Signature.', '{"error":"invalid_grant","error_description":"Invalid JWT Signature."}')
Pasting in the output of my pip freeze in case libraries may be outdated
beautifulsoup4==4.9.1
bs4==0.0.1
cachetools==4.1.0
certifi==2020.4.5.1
cffi==1.14.3
chardet==3.0.4
click==7.1.2
Flask==1.1.2
google-api-core==1.22.4
google-api-python-client==1.9.3
google-auth==1.22.1
google-auth-httplib2==0.0.3
google-cloud-bigquery==2.1.0
google-cloud-core==1.4.3
google-cloud-storage==1.29.0
google-crc32c==1.0.0
google-resumable-media==1.1.0
googleapis-common-protos==1.52.0
grpcio==1.32.0
httplib2==0.18.1
idna==2.9
itsdangerous==1.1.0
Jinja2==2.11.2
MarkupSafe==1.1.1
proto-plus==1.10.1
protobuf==3.12.2
psycopg2==2.8.5
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
pytz==2020.1
requests==2.23.0
rsa==4.6
six==1.15.0
soupsieve==2.0.1
SQLAlchemy==1.3.18
uritemplate==3.0.1
urllib3==1.25.9
Werkzeug==1.0.1
Found my dumb error. I forgot to enable the BigQuery API service... Step (2) in the quickstart tutorial
Related
So i am not using Huggin face a lot for my ai but I've discover that you can train you're ai with it so it tried to use my machine to train it but i kept having that error:
PS C:\Users\gboss\OneDrive\Bureau\Ai training> & C:/Users/gboss/AppData/Local/Programs/Python/Python310/python.exe "c:/Users/gboss/OneDrive/Bureau/Ai training/AiTraining.py"
Traceback (most recent call last):
File "c:\Users\gboss\OneDrive\Bureau\Ai training\AiTraining.py", line 8, in <module>
role = iam_client.get_role(RoleName='{IAM_ROLE_WITH_SAGEMAKER_PERMISSIONS}')['Role']['Arn']
File "C:\Users\gboss\AppData\Local\Programs\Python\Python310\lib\site-packages\botocore\client.py", line 514, in _api_call
return self._make_api_call(operation_name, kwargs)
File "C:\Users\gboss\AppData\Local\Programs\Python\Python310\lib\site-packages\botocore\client.py", line 921, in _make_api_call
http, parsed_response = self._make_request(
File "C:\Users\gboss\AppData\Local\Programs\Python\Python310\lib\site-packages\botocore\client.py", line 944, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "C:\Users\gboss\AppData\Local\Programs\Python\Python310\lib\site-packages\botocore\endpoint.py", line 119, in make_request
return self._send_request(request_dict, operation_model)
File "C:\Users\gboss\AppData\Local\Programs\Python\Python310\lib\site-packages\botocore\endpoint.py", line 198, in _send_request
request = self.create_request(request_dict, operation_model)
File "C:\Users\gboss\AppData\Local\Programs\Python\Python310\lib\site-packages\botocore\endpoint.py", line 134, in create_request
self._event_emitter.emit(
File "C:\Users\gboss\AppData\Local\Programs\Python\Python310\lib\site-packages\botocore\hooks.py", line 412, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "C:\Users\gboss\AppData\Local\Programs\Python\Python310\lib\site-packages\botocore\hooks.py", line 256, in emit
return self._emit(event_name, kwargs)
File "C:\Users\gboss\AppData\Local\Programs\Python\Python310\lib\site-packages\botocore\hooks.py", line 239, in _emit
response = handler(**kwargs)
File "C:\Users\gboss\AppData\Local\Programs\Python\Python310\lib\site-packages\botocore\signers.py", line 105, in handler
return self.sign(operation_name, request)
File "C:\Users\gboss\AppData\Local\Programs\Python\Python310\lib\site-packages\botocore\signers.py", line 189, in sign
auth.add_auth(request)
File "C:\Users\gboss\AppData\Local\Programs\Python\Python310\lib\site-packages\botocore\auth.py", line 418, in add_auth
raise NoCredentialsError()
botocore.exceptions.NoCredentialsError: Unable to locate credentials
and let's say that i can't find why because i don't use huggin face a lot
the code:
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFace
# gets role for executing training job
iam_client = boto3.client('iam')
role = iam_client.get_role(RoleName='{IAM_ROLE_WITH_SAGEMAKER_PERMISSIONS}')['Role']['Arn']
hyperparameters = {
'model_name_or_path':'ZipperXYZ/DialoGPT-medium-TheWorldMachineExpressive2',
'output_dir':'/opt/ml/model'
# add your remaining hyperparameters
# more info here https://github.com/huggingface/transformers/tree/v4.17.0/examples/pytorch/language-modeling
}
# git configuration to download our fine-tuning script
git_config = {'repo': 'https://github.com/huggingface/transformers.git','branch': 'v4.17.0'}
# creates Hugging Face estimator
huggingface_estimator = HuggingFace(
entry_point='run_clm.py',
source_dir='./examples/pytorch/language-modeling',
instance_type='ml.p3.2xlarge',
instance_count=1,
role=role,
git_config=git_config,
transformers_version='4.17.0',
pytorch_version='1.10.2',
py_version='py38',
hyperparameters = hyperparameters
)
# starting the train job
huggingface_estimator.fit()
You would first need to configure your credentials, its not an error in your code. Follow this thread
TLDR:
NoCredentialsError: Unable to locate credentials
Reply: Yes let me explain this, it is a bit complicated to get started. So the IAM ROLE, which you have created will be used inside the SageMaker Training Job/Inference. Meaning this ROLE is used to, e.g. download your data from s3 or is needed to start the underlying machine.
The error NoCredentialsError: Unable to locate credentials is shown when you don’t have credentials configured on your machine. You need to run aws configure so set up IAM Credentials (User) on your machine. You need these credentials to start your training job.
Environment details
OS type and version:
Python version: 3.9.0
pip version: 22.0.4
google-api-python-client version: 2.48.0
Description
Hi, I'm running into an error when trying to fetch the Google Play Console reports of our mobile apps (such as installations, errors etc.). I first tried with this manual but it seems to be outdated and didn't work. So after some research I changed it similar to this one, that it fits to the current google api (see code snippet below).
Steps I have done:
Created a project on "console.cloud.google.com"
Created the service account
Created the json key file
Invited the service account on play.google.com/console and gave him full admin rights (normally "see app information and download bulk reports" should be enough)
Added the role "Storage Object Viewer" to the Service account in https://console.cloud.google.com/iam-admin/iam?authuser=1&project=myproject
waited for 24h to make sure there are no errors because of syncs or so.
(I anonymized some of the values below).
Code example
from googleapiclient.discovery import build
from google.oauth2 import service_account
scopes = ['https://www.googleapis.com/auth/devstorage.read_only','https://www.googleapis.com/auth/cloud-platform.read_only']
key_file_location = 'files/access_token/mykeyfile.json'
cloud_storage_bucket = r'pubsite_prod_rev_00123456789'
report_to_download = 'installs/installs_com.my.app_202201_country.csv'
creds = service_account.Credentials.from_service_account_file(key_file_location,scopes=scopes)
service = build('storage','v1', credentials=creds)
print(service.objects().get(bucket = cloud_storage_bucket, object= report_to_download).execute())
Stack trace
Traceback (most recent call last):
File "C:\Users\myuser\project\z_10_ext_google_play_store.py", line 46, in <module>
print(service.objects().get(bucket = cloud_storage_bucket, object= report_to_download).execute())
File "D:\Programs\Python\lib\site-packages\googleapiclient\_helpers.py", line 130, in positional_wrapper
return wrapped(*args, **kwargs)
File "D:\Programs\Python\lib\site-packages\googleapiclient\http.py", line 923, in execute
resp, content = _retry_request(
File "D:\Programs\Python\lib\site-packages\googleapiclient\http.py", line 191, in _retry_request
resp, content = http.request(uri, method, *args, **kwargs)
File "D:\Programs\Python\lib\site-packages\google_auth_httplib2.py", line 209, in request
self.credentials.before_request(self._request, method, uri, request_headers)
File "D:\Programs\Python\lib\site-packages\google\auth\credentials.py", line 133, in before_request
self.refresh(request)
File "D:\Programs\Python\lib\site-packages\google\oauth2\service_account.py", line 410, in refresh
access_token, expiry, _ = _client.jwt_grant(
File "D:\Programs\Python\lib\site-packages\google\oauth2\_client.py", line 199, in jwt_grant
six.raise_from(new_exc, caught_exc)
File "<string>", line 3, in raise_from
google.auth.exceptions.RefreshError: ('No access token in response.', {'id_token': 'eyJ...'})
I hope that I provided enough information and I'm sorry in advance if I made a stupid mistake.
I would like to use python kubernetes-client to connect to my AKS cluster api.
To do that I try to use the example give by kubernetes:
config.load_kube_config()
v1 = client.CoreV1Api()
print("Listing pods with their IPs:")
ret = v1.list_pod_for_all_namespaces(watch=False)
for i in ret.items:
print("%s\t%s\t%s" % (i.status.pod_ip, i.metadata.namespace, i.metadata.name))
It is supposed to load my local kubeconfig and get a pods list but I get the following error:
Traceback (most recent call last): File "test.py", line 4, in
config.load_kube_config() File "/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/kubernetes/config/kube_config.py",
line 661, in load_kube_config
loader.load_and_set(config) File "/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/kubernetes/config/kube_config.py",
line 469, in load_and_set
self._load_authentication() File "/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/kubernetes/config/kube_config.py",
line 203, in _load_authentication
if self._load_auth_provider_token(): File "/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/kubernetes/config/kube_config.py",
line 221, in _load_auth_provider_token
return self._load_azure_token(provider) File "/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/kubernetes/config/kube_config.py",
line 233, in _load_azure_token
self._refresh_azure_token(provider['config']) File "/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/kubernetes/config/kube_config.py",
line 253, in _refresh_azure_token
refresh_token, client_id, '00000002-0000-0000-c000-000000000000') File
"/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/adal/authentication_context.py",
line 236, in acquire_token_with_refresh_token
return self._acquire_token(token_func) File "/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/adal/authentication_context.py",
line 128, in _acquire_token
return token_func(self) File "/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/adal/authentication_context.py",
line 234, in token_func
return token_request.get_token_with_refresh_token(refresh_token, client_secret) File
"/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/adal/token_request.py",
line 343, in get_token_with_refresh_token
return self._get_token_with_refresh_token(refresh_token, None, client_secret) File
"/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/adal/token_request.py",
line 340, in _get_token_with_refresh_token
return self._oauth_get_token(oauth_parameters) File "/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/adal/token_request.py",
line 112, in _oauth_get_token
return client.get_token(oauth_parameters) File "/Users//works/test-kube-api-python/env/lib/python2.7/site-packages/adal/oauth2_client.py",
line 291, in get_token
raise AdalError(return_error_string, error_response) adal.adal_error.AdalError: Get Token request returned http error: 400
and server response:
{"error":"invalid_grant","error_description":"AADSTS65001: The user or
administrator has not consented to use the application with ID
'' named 'Kubernetes AD Client
'. Send an interactive authorization request for this user and
resource.\r\nTrace ID:
\r\nCorrelation ID:
\r\nTimestamp: 2019-10-14
12:32:35Z","error_codes":[65001],"timestamp":"2019-10-14
12:32:35Z","trace_id":"","correlation_id":"","suberror":"consent_required"}
I really don't understand why it doesn't work.
When I use kubectl, all work fine.
I read some docs but I'm not sure to understand the adal error.
Thanks for your help
Login as a tenant admin to https://portal.azure.com
Open the registration for your app in the
Go to Settings then Required Permissions
Press the Grant Permissions button
If you are not a tenant admin, you cannot give admin consent
From https://github.com/Azure-Samples/active-directory-angularjs-singlepageapp-dotnet-webapi/issues/19
This is good post where you can find snippet to authenticate to AKS:
from azure.identity import AzureCliCredential
from azure.mgmt.resource import ResourceManagementClient
from azure.mgmt.containerservice import ContainerServiceClient
from azure.mgmt.containerservice.models import (ManagedClusterAgentPoolProfile,
ManagedCluster)
credential = AzureCliCredential()
subscription_id = "XXXXX"
resource_group= 'MY-RG'
resouce_client=ResourceManagementClient(credential,subscription_id)
container_client=ContainerServiceClient(credential,subscription_id)
resouce_list=resouce_client.resources.list_by_resource_group(resource_group)
Note: You need to install respective Az Python SKD libraries.
Application has been working normally, now on a re-deploy, google storage is giving strange errors.
MissingSchema: Invalid URL 'None/storage/v1/b/my-bucket-name?projection=noAcl': No schema supplied. Perhaps you meant http://None/storage/v1/b/my-bucket-name?projection=noAcl?
File "/usr/local/lib/python2.7/dist-packages/lib/file_store.py", line 11, in __init__
self.bucket = self.client.get_bucket(parts[0])
File "/usr/local/lib/python2.7/dist-packages/google/cloud/storage/client.py", line 301, in get_bucket
bucket.reload(client=self)
File "/usr/local/lib/python2.7/dist-packages/google/cloud/storage/_helpers.py", line 130, in reload
_target_object=self,
File "/usr/local/lib/python2.7/dist-packages/google/cloud/_http.py", line 392, in api_request
target_object=_target_object,
File "/usr/local/lib/python2.7/dist-packages/google/cloud/_http.py", line 269, in _make_request
return self._do_request(method, url, headers, data, target_object)
File "/usr/local/lib/python2.7/dist-packages/google/cloud/_http.py", line 298, in _do_request
return self.http.request(url=url, method=method, headers=headers, data=data)
File "/usr/local/lib/python2.7/dist-packages/google/auth/transport/requests.py", line 208, in request
method, url, data=data, headers=request_headers, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 519, in request
prep = self.prepare_request(req)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 462, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 313, in prepare
self.prepare_url(url, params)
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 387, in prepare_url
raise MissingSchema(error)
MissingSchema: Invalid URL 'None/storage/v1/b/my-bucket-name?projection=noAcl': No schema supplied. Perhaps you meant http://None/storage/v1/b/my-bucket-name?projection=noAcl? [while running 'generatedPtransform-51']
The code causing the error, I can run this locally using the same service account and it works, no error. I am using $env:GOOGLE_APPLICATION_CREDENTIALS to export my service account credentials at deploy time. All other services are working normally.
# My test is:
# fs = FileStore("gs://my-bucket-name/models/", "development", "general")
class FileStore():
# modelPath - must be a gs:// style google storage resource path containing everything but the file extension
def __init__(self, modelPath, env, modelName):
from google.cloud import storage
parts = modelPath[5:].split('/', 1)
self.client = storage.Client()
self.bucket = self.client.get_bucket(parts[0]) # <- error here
Why would google core client fail to build a URL? Based on 'None/storage/v1/b/my-bucket-name?projection=noAcl', the missing part of the URL should be something like "https://www.googleapis.com".
This error is apparently caused by a mismatch between google_cloud_storage and google_cloud_core. I had specified google_cloud_core >= 1.0.3 in my setup.py but when I looked on the docker image on the compute VM I found it had an earlier version.
After rebuilding my venv from setup.py I had to also run:
C:\Python27\python.exe -m pipenv install google-cloud-core>=1.0.3 --skip-lock
Then I was able to deploy and the application started working again.
I want to make some very easy tasks on BigQuery via a python script. I found this package which does not work well. Indeed, when I try this code:
from bigquery import get_client
project_id = 'txxxxxxxxxxxxxxxxxx9'
# Service account email address as listed in the Google Developers Console.
service_account = '7xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com'
# PKCS12 or PEM key provided by Google.
key = '/home/fxxxxxxxxxxxx/Dropbox/access_keys/google_storage/xxxxxxxxxxxxxxxxxxxxx.pem'
client = get_client(project_id, service_account=service_account, private_key_file=key, readonly=True)
# Submit an async query.
results = client.get_table_schema('newdataset', 'newtable2')
print('results')
I get this error:
/home/xxxxxx/anaconda3/envs/snakes/bin/python2.7 /home/xxxxxx/Dropbox/Prog/bigQuery_daily_import/src/main.py
Traceback (most recent call last):
File "/home/xxxxxx/Dropbox/Prog/bigQuery_daily_import/src/main.py", line 9, in <module>
client = get_client(project_id, service_account=service_account, private_key_file=key, readonly=True)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/bigquery/client.py", line 83, in get_client
readonly=readonly)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/bigquery/client.py", line 101, in _get_bq_service
service = build('bigquery', 'v2', http=http)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/util.py", line 142, in positional_wrapper
return wrapped(*args, **kwargs)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/googleapiclient/discovery.py", line 196, in build
cache)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/googleapiclient/discovery.py", line 242, in _retrieve_discovery_doc
resp, content = http.request(actual_url)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/client.py", line 565, in new_request
self._refresh(request_orig)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/client.py", line 835, in _refresh
self._do_refresh_request(http_request)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/client.py", line 862, in _do_refresh_request
body = self._generate_refresh_request_body()
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/client.py", line 1541, in _generate_refresh_request_body
assertion = self._generate_assertion()
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/client.py", line 1670, in _generate_assertion
private_key, self.private_key_password), payload)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/_pycrypto_crypt.py", line 121, in from_string
pkey = RSA.importKey(parsed_pem_key)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/Crypto/PublicKey/RSA.py", line 665, in importKey
return self._importKeyDER(der)
File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/Crypto/PublicKey/RSA.py", line 588, in _importKeyDER
raise ValueError("RSA key format is not supported")
ValueError: RSA key format is not supported
Process finished with exit code 1
My question: is there a tutorial in python which shows how to communicate easily with BigQuery: importing a dataset from google storage or S3, querying something, exporting the result to google storage.
A lot depends on your environment, and once you've figure that out everything should be super simple. I see the only problem on the error log you pasted is figuring out authentication.
Python pandas has had support for BigQuery for a while:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.gbq.read_gbq.html
And I did a video with the creators of the module:
https://www.youtube.com/watch?v=gLeTDUMb7HY
Now, the simplest and fastest way these days to launch an Jupyter notebook with all of the Google Cloud goodies you mention is our new Google Datalab project:
https://cloud.google.com/datalab/
The only Datalab caveat is that it works on cloud servers, but if you want a fully managed Jupyter/IPython environment, totally secure, persistent, and ready to handle BigQuery, storage, etc... try it out.
Meanwhile, if you are writing a web application look at how other web applications solve this task.
For example, re:dash code to connect to BigQuery:
https://github.com/EverythingMe/redash/blob/master/redash/query_runner/big_query.py