bigquery, extract_table AttributeError: 'Client' object has no attribute 'dataset' - python

my question is about a code to extract a table extract a table from Bigquery and save it as a json file
.
I made my code mostly by following the gcloud tutorials on their documentation.
I couldn't implicit set my credentials, so I did it in a explicit way, to my json file. But it seems that it doesn't quite get the "Client" object by the path I took.
If anyone could clarify me how this whole implicit and explicit credential works, would help me a lot too!
I am using python 2.7 and pycharm. The code is as follows:
from gcloud import bigquery
from google.cloud import storage
def bigquery_get_rows ():
json_key = "path/to/my/json_file.json"
storage_client = storage.Client.from_service_account_json(json_key)
print("\nPeguei o Cliente\n")
# Make an authenticated API request
buckets = list(storage_client.list_buckets())
print(buckets)
print(storage_client)
#Setando ambiente
bucket_name = 'my_bucket/name'
print(bucket_name)
destination_uri = 'gs://{}/{}'.format(bucket_name, 'my_table_json_name.json')
print(destination_uri)
#dataset_ref = client.dataset('samples', project='my_project_name')
dataset_ref = storage_client.dataset('my_dataset_name', project='my_project_id')
print(dataset_ref)
table_ref = dataset_ref.table('my_table_to_be_extracted_name')
print(table_ref)
job_config = bigquery.job.ExtractJobConfig()
job_config.destination_format = (
bigquery.DestinationFormat.NEWLINE_DELIMITED_JSON)
extract_job = client.extract_table(
table_ref, destination_uri, job_config=job_config) # API request
extract_job.result() # Waits for job to complete.
bigquery_get_rows()

You are using wrong client object. You try to use gcs client to work with bigquery.
Instead of
dataset_ref = storage_client.dataset('my_dataset_name', project='my_project_id')
it should be:
bq_client = bigquery.Client.from_service_account_json(
'path/to/service_account.json')
dataset_ref = bq_client.dataset('my_dataset_name', project='my_project_id')

Related

How to generate azure storage container SAS url using python

I am unable to copy blob(Errror : CopySourceNotVerfied) using the SAS url generated dynamically using this code which I found here
How can I generate an Azure blob SAS URL in Python?
account_name = 'account'
account_key = 'key'
container_name = 'feed'
blob_name='standard_feed'
requiredSASToken = generate_blob_sas(account_name=account_name,
container_name=container_name,
blob_name=blob_name,
account_key=account_key,
permission=BlobSasPermissions(read=True),
expiry=datetime.utcnow() + timedelta(hours=24))
Instead am able to copy blob when used the SAS token generated from the azure portal, I hardcoded and tested.
The Difference observed between two tokens are like below
Code generated : se=2022-09-15T17%3A47%3A06Z&sp=r&sv=2021-08-06&sr=b&sig=xxx (sr is b)
Manually Generated : sp=r&st=2022-09-14T17:13:49Z&se=2022-09-17T01:13:49Z&spr=https&sv=2021-06-08&sr=c&sig=xxxx (sr is c)
How to generate SAS token for an azure storage container? i.e the SAS token must have all required fields especially signedResource(sr) must be container(c)?
I tried to reproduce the same in my environment and got below results:
from datetime import datetime, timedelta
from azure.storage.blob import BlobClient, generate_blob_sas, BlobSasPermissions
account_name = 'your account name'
account_key = 'your account key'
container_name = 'container'
blob_name = 'pandas.json'
def get_blob_sas(account_name,account_key, container_name, blob_name):
sas_blob = generate_blob_sas(account_name=account_name,
container_name=container_name,
blob_name=blob_name,
account_key=account_key,
permission=BlobSasPermissions(read=True),
expiry=datetime.utcnow() + timedelta(hours=1))
return sas_blob
blob = get_blob_sas(account_name,account_key, container_name, blob_name)
print(blob)
I got similar output:
I added start time in generate SAS in code:
How to generate SAS token for an azure storage container? i.e the SAS token must have all required fields especially signedResource(sr) must be container(c)?
Output : It has all required fields for SAS:
st=2022-09-15T12%3A23%3A44Z&se=2022-09-15T13%3A23%3A44Z&sp=r&sv=2021-08-06&sr=b&sig=XXXXX
I also tested my generated SAS token with url and got below result:
It says BlobNotFound, The specified blob does not exist. Strange is It says the same for the one generated manually.
I seen this error in your comment please check you have given correct blob name and path. kindly check the below proper URL.
https://{storage-account}.blob.core.windows.net/{container-name}/files/{file-name}.pdf.+ SAS

Deploy resource in Azure with python

I am working on deploy resources in Azure using python based on provided templates. As a starting point I am working with https://github.com/Azure-Samples/Hybrid-Resource-Manager-Python-Template-Deployment
Using as it is, I am having an issue at the beginning of the deployment (deployer.py deploy function)
def deploy(self, template, parameters):
"""Deploy the template to a resource group."""
self.client.resource_groups.create_or_update(
self.resourceGroup,
{
'location': os.environ['AZURE_RESOURCE_LOCATION']
}
)
The error message is
Message='ServicePrincipalCredentials' object has no attribute
'get_token'
The statement is correct, ServicePrincipalCredentials get_token attribute doesn't exist, however token is, may be an error due to an outdated version?
Based on the constructor information, the error may be on credentials creation or client creation
def __init__(self, subscription_id, resource_group, pub_ssh_key_path='~/id_rsa.pub'):
mystack_cloud = get_cloud_from_metadata_endpoint(
os.environ['ARM_ENDPOINT'])
subscription_id = os.environ['AZURE_SUBSCRIPTION_ID'] //This may be an error as subscription_id is already provided as a parameter
credentials = ServicePrincipalCredentials(
client_id=os.environ['AZURE_CLIENT_ID'],
secret=os.environ['AZURE_CLIENT_SECRET'],
tenant=os.environ['AZURE_TENANT_ID'],
cloud_environment=mystack_cloud
) --> here
self.subscription_id = subscription_id
self.resource_group = resource_group
self.dns_label_prefix = self.name_generator.haikunate()
pub_ssh_key_path = os.path.expanduser(pub_ssh_key_path)
# Will raise if file not exists or not enough permission
with open(pub_ssh_key_path, 'r') as pub_ssh_file_fd:
self.pub_ssh_key = pub_ssh_file_fd.read()
self.credentials = credentials
self.client = ResourceManagementClient(self.credentials, self.subscription_id,
base_url=mystack_cloud.endpoints.resource_manager) --> here
Do you know how I can fix this?
After struggling a little, I could find a solution. Just replace
credentials = ServicePrincipalCredentials(
client_id=os.environ['AZURE_CLIENT_ID'],
secret=os.environ['AZURE_CLIENT_SECRET'],
tenant=os.environ['AZURE_TENANT_ID'],
cloud_environment=mystack_cloud
)
for
self.credentials = DefaultAzureCredential()
The final code looks like:
from azure.identity import DefaultAzureCredential
def __init__(self, subscriptionId, resourceGroup):
endpoints = get_cloud_from_metadata_endpoint(os.environ.get("ARM_ENDPOINT"))
self.subscriptionId = subscriptionId
self.resourceGroup = resourceGroup
self.credentials = DefaultAzureCredential()
self.client = ResourceManagementClient(self.credentials, self.subscriptionId,
base_url=endpoints.endpoints.resource_manager)

Python Boto3 to Aws Sdk for blob storage

This code retrieves the buckets of a Amazon S3-compatible storage (not Amazon AWS but the Zadara compatible cloud storage) and IT WORKS:
import boto3
from botocore.client import Config
session = boto3.session.Session( )
s3_client = session.client(
service_name = 's3',
region_name = 'IT',
aws_access_key_id = 'xyz',
aws_secret_access_key = 'abcedf',
endpoint_url = 'https://nothing.com:443',
config = Config(signature_version='s3v4'),
)
print('Buckets')
boto3.set_stream_logger(name='botocore')
print(s3_client.list_buckets())
I am trying to use the same method to access S3 via C# and AWS SDK, anyway I always obtain the error "The request signature we calculated does not match the signature you provided. Check your key and signing method.".
AmazonS3Config config = new AmazonS3Config();
config.AuthenticationServiceName = "s3";
config.ServiceURL = "https://nothing.com:443";
config.SignatureVersion = "s3v4";
config.AuthenticationRegion = "it";
AmazonS3Client client = new AmazonS3Client(
"xyz",
"abcdef",
config);
ListBucketsResponse r = await client.ListBucketsAsync();
What can I do? Why it is not working? I can't get a solution.
I tried also to trace debug infos:
Python
boto3.set_stream_logger(name='botocore')
C#
AWSConfigs.LoggingConfig.LogResponses = ResponseLoggingOption.Always;
AWSConfigs.LoggingConfig.LogMetrics = true;
AWSConfigs.LoggingConfig.LogTo = Amazon.LoggingOptions.SystemDiagnostics;
AWSConfigs.AddTraceListener("Amazon", new System.Diagnostics.ConsoleTraceListener());
but for C# it is not logging the whole request.
Any suggestion?

Azure: create storage account with container and upload blob to it in Python

I'm trying to create a storage account in Azure and upload a blob into it using their python SDK.
I managed to create an account like this:
client = get_client_from_auth_file(StorageManagementClient)
storage_account = client.storage_accounts.create(
resourceGroup,
name,
StorageAccountCreateParameters(
sku=Sku(name=SkuName.standard_ragrs),
enable_https_traffic_only=True,
kind=Kind.storage,
location=region)).result()
The problem is that later I'm trying to build a container and I don't know what to insert as "account_url"
I have tried doing:
client = get_client_from_auth_file(BlobServiceClient, account_url=storage_account.primary_endpoints.blob)
return client.create_container(name)
But I'm getting:
azure.core.exceptions.ResourceNotFoundError: The specified resource does not exist
I did manage to create a container using:
client = get_client_from_auth_file(StorageManagementClient)
return client.blob_containers.create(
resourceGroup,
storage_account.name,
name,
BlobContainer(),
public_access=PublicAccess.Container
)
But later when I'm trying to upload a blob using BlobServiceClient or BlobClien I still need the "account_url" so I'm still getting an error:
azure.core.exceptions.ResourceNotFoundError: The specified resource does not exist
Anyone can help me to understand how do I get the account_url for a storage account I created with the SDK?
EDIT:
I managed to find a workaround to the problem by creating the connection string from the storage keys.
storage_client = get_client_from_auth_file(StorageManagementClient)
storage_keys = storage_client.storage_accounts.list_keys(resource_group, account_name)
storage_key = next(v.value for v in storage_keys.keys)
return BlobServiceClient.from_connection_string(
'DefaultEndpointsProtocol=https;' +
f'AccountName={account_name};' +
f'AccountKey={storage_key};' +
'EndpointSuffix=core.windows.net')
This works but I thin George Chen answer is more elegant.
I could reproduce this problem, then I found get_client_from_auth_file could not pass the credential to the BlobServiceClient, cause if just create BlobServiceClient with account_url without credential it also could print the account name.
So if you want to use a credential to get BlobServiceClient, you could use the below code, then do other operations.
credentials = ClientSecretCredential(
'tenant_id',
'application_id',
'application_secret'
)
blobserviceclient=BlobServiceClient(account_url=storage_account.primary_endpoints.blob,credential=credentials)
If you don't want this way, you could create the BlobServiceClient with the account key.
client = get_client_from_auth_file(StorageManagementClient,auth_path='auth')
storage_account = client.storage_accounts.create(
'group name',
'account name',
StorageAccountCreateParameters(
sku=Sku(name=SkuName.standard_ragrs),
enable_https_traffic_only=True,
kind=Kind.storage,
location='eastus',)).result()
storage_keys = client.storage_accounts.list_keys(resource_group_name='group name',account_name='account name')
storage_keys = {v.key_name: v.value for v in storage_keys.keys}
blobserviceclient=BlobServiceClient(account_url=storage_account.primary_endpoints.blob,credential=storage_keys['key1'])
blobserviceclient.create_container(name='container name')

How to run a BigQuery query in Python

This is the query that I have been running in BigQuery that I want to run in my python script. How would I change this/ what do I have to add for it to run in Python.
#standardSQL
SELECT
Serial,
MAX(createdAt) AS Latest_Use,
SUM(ConnectionTime/3600) as Total_Hours,
COUNT(DISTINCT DeviceID) AS Devices_Connected
FROM `dataworks-356fa.FirebaseArchive.testf`
WHERE Model = "BlueBox-pH"
GROUP BY Serial
ORDER BY Serial
LIMIT 1000;
From what I have been researching it is saying that I cant save this query as a permanent table using Python. Is that true? and if it is true is it possible to still export a temporary table?
You need to use the BigQuery Python client lib, then something like this should get you up and running:
from google.cloud import bigquery
client = bigquery.Client(project='PROJECT_ID')
query = "SELECT...."
dataset = client.dataset('dataset')
table = dataset.table(name='table')
job = client.run_async_query('my-job', query)
job.destination = table
job.write_disposition= 'WRITE_TRUNCATE'
job.begin()
https://googlecloudplatform.github.io/google-cloud-python/stable/bigquery-usage.html
See the current BigQuery Python client tutorial.
Here is another way using a JSON file for the service account:
>>> from google.cloud import bigquery
>>>
>>> CREDS = 'test_service_account.json'
>>> client = bigquery.Client.from_service_account_json(json_credentials_path=CREDS)
>>> job = client.query('select * from dataset1.mytable')
>>> for row in job.result():
... print(row)
This is a good usage guide:
https://googleapis.github.io/google-cloud-python/latest/bigquery/usage/index.html
To simply run and write a query:
# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'your_dataset_id'
job_config = bigquery.QueryJobConfig()
# Set the destination table
table_ref = client.dataset(dataset_id).table("your_table_id")
job_config.destination = table_ref
sql = """
SELECT corpus
FROM `bigquery-public-data.samples.shakespeare`
GROUP BY corpus;
"""
# Start the query, passing in the extra configuration.
query_job = client.query(
sql,
# Location must match that of the dataset(s) referenced in the query
# and of the destination table.
location="US",
job_config=job_config,
) # API request - starts the query
query_job.result() # Waits for the query to finish
print("Query results loaded to table {}".format(table_ref.path))
I personally prefer querying using pandas:
# BQ authentication
import pydata_google_auth
SCOPES = [
'https://www.googleapis.com/auth/cloud-platform',
'https://www.googleapis.com/auth/drive',
]
credentials = pydata_google_auth.get_user_credentials(
SCOPES,
# Set auth_local_webserver to True to have a slightly more convienient
# authorization flow. Note, this doesn't work if you're running from a
# notebook on a remote sever, such as over SSH or with Google Colab.
auth_local_webserver=True,
)
query = "SELECT * FROM my_table"
data = pd.read_gbq(query, project_id = MY_PROJECT_ID, credentials=credentials, dialect = 'standard')
The pythonbq package is very simple to use and a great place to start. It uses python-gbq.
To get started you would need to generate a BQ json key for external app access. You can generate your key here.
Your code would look something like:
from pythonbq import pythonbq
myProject=pythonbq(
bq_key_path='path/to/bq/key.json',
project_id='myGoogleProjectID'
)
SQL_CODE="""
SELECT
Serial,
MAX(createdAt) AS Latest_Use,
SUM(ConnectionTime/3600) as Total_Hours,
COUNT(DISTINCT DeviceID) AS Devices_Connected
FROM `dataworks-356fa.FirebaseArchive.testf`
WHERE Model = "BlueBox-pH"
GROUP BY Serial
ORDER BY Serial
LIMIT 1000;
"""
output=myProject.query(sql=SQL_CODE)

Categories

Resources