How to download file from google cloud bucket?

How to download file from google cloud bucket? - python

I just received an access to bucket gs://asdasdasdasdd-sadasdasd on Google Cloud Storage with files for test exercise.
They said I have an access for my google account.
But how am I supposed to download file rom there in python? With which credentails?
I created service account and downloaded json file with my credentials, but I am forbidden to download files form the bucket.
How should I process further?
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from io import BytesIO
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="account.json"
from google.cloud import storage
storage_client = storage.Client()
bucket = storage_client.get_bucket('asdasdasdasdd-sadasdasd')
blob = bucket.blob('streams/2017/09/09/allcountries')
path = "gs://asdasdasdasdd-sadasdasd/streams/2017/09/09/allcountries.csv"
df = pd.read_csv(path)
I am able to download file with gsutil but I need to do the same with python. Someway I need to verify my email becuase I was granted to download file on my google email.

I assume you were granted a role to access the bucket. If so, you do not need the service account key (.json file), as this key has been generated by you, therefore it is granting permissions to the resources under your project and not someone else's.
Make sure the role you were given is roles/storage.admin as this is the role needed to download files from the specified bucket.
Another option would be to indeed use a service account key, containing the same role, but it has to be given to you by the owner of the bucket.
Lastly, I tried your code and came across an error once I was able to connect to the bucket. If you encounter an IOError telling you the file does not exist, take a look at this post for a possible solution.

Give this a try:
import pathlib
import google.cloud.storage as gcs
client = gcs.Client()
#set target file to write to
target = pathlib.Path("local_file.txt")
#set file to download
FULL_FILE_PATH = "gs://bucket_name/folder_name/file_name.txt"
#open filestream with write permissions
with target.open(mode="wb") as downloaded_file:
#download and write file locally
client.download_blob_to_file(FULL_FILE_PATH, downloaded_file)

Related

google storage error: Bucket is requester pays bucket but no user project provided

I want to download files using "file.json" which includes all URLs. Here I tried to use those code in Python, however I am getting this error:
"code":400,"message":"Bucket is requester pays bucket but no user project provided"
I already set up my billing details when I created the GCP account. I really don't know how to solve it. I am not the owner of the bucket. I do have permission to get data from the bucket.
Python code:
import os
from google.cloud.bigquery.client import Client
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path/to/key.json'
bq_client = Client()
import json
from google.cloud import storage
client = storage.Client(project='my_projectID')
bucket = client.get_bucket('the_bucket')
file_json = bucket.get_blob('file.json')
data = json.loads(file_json.download_as_string())

You need to provide the user_project in the request, which is a gcp project for which you have billing rights. The requests will then be charged to that project.
You can find a python code sample here: https://cloud.google.com/storage/docs/using-requester-pays#using
bucket = storage_client.bucket(bucket_name, user_project=project_id)
See here for which permissions you need in the user_project: https://cloud.google.com/storage/docs/requester-pays#requirements
serviceusage.services.use

Generate SAS token for a directory in Azure blob storage in Python

I'm attempting to use Python in order to limit which parts of my Azure storage different users can access.
I have been looking for code that can generate a SAS token for a specific directory in my Storage container. I am hoping that generating a SAS token on my directory, will give me access to the files/blobs it contains. (Just like how it works on azure.portal, where I can right-click my directory and press 'Generate SAS'.
however I have not been able to find any Python code that could archive this.
All I can find are the following 3 function:
generate_account_sas()
generate_container_sas()
generate_blob_sas()
Found here: https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob?view=azure-python
I have attemted to use the 'generate_blob_sas()' function but using the name of my directory instead of a file/blob.
from datetime import datetime, timedelta
from azure.storage.blob import BlobClient, generate_blob_sas, BlobSasPermissions
account_name = 'STORAGE_ACCOUNT_NAME'
account_key = 'STORAGE_ACCOUNT_ACCESS_KEY'
container_name = 'CONTAINER_NAME'
blob_name = 'NAME OF MY DIRECTORY'
def get_blob_sas(account_name,account_key, container_name, blob_name):
sas_blob = generate_blob_sas(account_name=account_name,
container_name=container_name,
blob_name=blob_name,
account_key=account_key,
permission=BlobSasPermissions(read=True),
expiry=datetime.utcnow() + timedelta(hours=1))
return sas_blob
blob = get_blob_sas(account_name,account_key, container_name, blob_name)
url = 'https://'+account_name+'.blob.core.windows.net/'+container_name+'/'+blob_name+'?'+blob
However when I attempt to use this url, I get the following response:
This XML file does not appear to have any style information associated with it. The document tree is shown below.
<Error>
<Code>AuthenticationFailed</Code>
<Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature. RequestId:31qv254a-201e-0509-3f26-8587fb000000 Time:2021-07-30T09:37:21.1142028Z</Message>
<AuthenticationErrorDetail>Signature did not match. String to sign used was rt 2021-07-30T10:08:37Z /blob/my_account/my_container/my_directory/my_file.png 2020-06-12 b </AuthenticationErrorDetail>
</Error>
Is there some other way for me, to generate a SAS token on a directory?

From your description, it looks like your storage account is Data Lake Gen2. If that's the case, then you will need to use a different SDK.
The SDK you're using is for Azure Blob Storage (non Data Lake Gen2) accounts where folders are virtual folders and not the real ones.
The SDK you would want to use is azure-storage-file-datalake and the method you would want to use for generating a SAS token on a directory will be generate_file_system_sas.

Is there a temporary directory or direct way to upload a file in azure storage?

so I try to make a python API so the user can upload a pdf file then the API directly sends it to Azure storage. what I found is I must have a directory i.e.
container_client = ContainerClient.from_connection_string(conn_str=conn_str,container_name='mycontainer')
with open('mylocalpath/myfile.pdf',"rb") as data:
container_client.upload_blob(name='myblockblob.pdf', data=data)
another solution is I have to store it on VM and then replace the local path to it, but I don't want to make my VM full.

If you want to upload it directly from the client-side to azure storage blob instead of receiving that file to your API you can use azure shared access signature inside your storage account and from your API you can make a function to generate Pre-Signed URL using that shared access signature service and return that URL to your client it will allow the client to upload file to your blob via that URL.
To generate URL can you follow the below code:
from datetime import datetime, timedelta
from azure.storage.blob import generate_blob_sas, BlobSasPermissions
blobname= "<blobname>"
accountkey="<accountkey>" #get this from access key section in azure storage.
containername = "<containername>"
def getpushurl(filename):
token = generate_blob_sas(
account_name=blobname,
container_name=containername,
account_key=accountkey,
permission=BlobSasPermissions(write=True),
expiry=datetime.utcnow() + timedelta(seconds=100),
blob_name=filename,
)
url = f"https://{blobname}.blob.core.windows.net/{containername}/{filename}?{token}"
return url
pdfpushurl = getpushurl("demo.text")
print(pdfpushurl)
So after generating this URL give it to the client so client could directly send the file to the URL received with PUT request and it will get uploaded directly to azure storage.

You can generate a SAS token with write permission for your users so that your users could upload .pdf files directly on their side without storing them on the server. For details, pls see my previous post here.
Try the code below to generate a SAS token with container write permission:
from azure.storage.blob import BlobServiceClient,ContainerSasPermissions,generate_container_sas
from datetime import datetime, timedelta
storage_connection_string=''
container_name = ''
block_blob_service = BlobServiceClient.from_connection_string(storage_connection_string)
container_client = block_blob_service.get_container_client(container_name)
sasToken = generate_container_sas(account_name=container_client.account_name,
container_name=container_client.container_name,
account_key= container_client.credential.account_key,
#grant write permission only
permission=ContainerSasPermissions(write=True),
start=datetime.utcnow() - timedelta(minutes=1),
#1 hour vaild time
expiry=datetime.utcnow() + timedelta(hours=1)
)
print(sasToken)
After you have replied to this SAS token to your user, just see this official guide to upload files from a HTML page, I think it would be helpful if you are developing a web app.

How can I obtain Etag of a file in a storage account using Python SDK for Azure?

I want to get the etag associated with a file which is uploaded in my storage account in my python code.

Please use the code below:
from azure.storage.blob import BlockBlobService
block_blob_service = BlockBlobService(account_name='xx', account_key='xx')
myetag = block_blob_service.get_blob_properties("your_container","the_blob_name").properties.etag
print(myetag)
Test result:

Trouble with Google Application Credentials

Hi there first and foremost this is my first time using Googles services. I'm trying to develop an app with the Google AutoML Vision Api (Custom Model). I have already build a custom model and generated the API keys(I hope I did it correctly tho).
After many attempts of developing via Ionics & Android and failing to connect to the to the API.
I have now taken the prediction modelling given codes in Python (on Google Colab) and even with that I still get an error message saying that Could not automatically determine credentials. I'm not sure where I have gone wrong in this. Please help. Dying.
#installing & importing libraries
!pip3 install google-cloud-automl
import sys
from google.cloud import automl_v1beta1
from google.cloud.automl_v1beta1.proto import service_pb2
#import key.json file generated by GOOGLE_APPLICATION_CREDENTIALS
from google.colab import files
credentials = files.upload()
#explicit function given by Google accounts
[https://cloud.google.com/docs/authentication/production#auth-cloud-implicit-python][1]
def explicit():
from google.cloud import storage
# Explicitly use service account credentials by specifying the private key
# file.
storage_client = storage.Client.from_service_account_json(credentials)
# Make an authenticated API request
buckets = list(storage_client.list_buckets())
print(buckets)
#import image for prediction
from google.colab import files
YOUR_LOCAL_IMAGE_FILE = files.upload()
#prediction code from modelling
def get_prediction(content, project_id, model_id):
prediction_client = automl_v1beta1.PredictionServiceClient()
name = 'projects/{}/locations/uscentral1/models/{}'.format(project_id,
model_id)
payload = {'image': {'image_bytes': content }}
params = {}
request = prediction_client.predict(name, payload, params)
return request # waits till request is returned
#print function substitute with values
content = YOUR_LOCAL_IMAGE_FILE
project_id = "REDACTED_PROJECT_ID"
model_id = "REDACTED_MODEL_ID"
print (get_prediction(content, project_id, model_id))
Error Message when run the last line of code:

credentials = files.upload()
storage_client = storage.Client.from_service_account_json(credentials)
these two lines are the issue I think.
The first one actually loads the contents of the file, but the second one expects a path to a file, instead of the contents.
Lets tackle the first line first:
I see that just passing the credentials you get after calling credentials = files.upload() will not work as explained in the docs for it. Doing it like you're doing, the credentials don't actually contain the value of the file directly, but rather a dictionary for filenames & contents.
Assuming you're only uploading the 1 credentials file, you can get the contents of the file like this (stolen from this SO answer):
from google.colab import files
uploaded = files.upload()
credentials_as_string = uploaded[uploaded.keys()[0]]
So now we actually have the contents of the uploaded file as a string, next step is to create an actual credentials object out of it.
This answer on Github shows how to create a credentials object from a string converted to json.
import json
from google.oauth2 import service_account
credentials_as_dict = json.loads(credentials_as_string)
credentials = service_account.Credentials.from_service_account_info(credentials_as_dict)
Finally we can create the storage client object using this credentials object:
storage_client = storage.Client(credentials=credentials)
Please note I've not tested this though, so please give it a go and see if it actually works.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to download file from google cloud bucket? - python

Related

google storage error: Bucket is requester pays bucket but no user project provided

Generate SAS token for a directory in Azure blob storage in Python

Is there a temporary directory or direct way to upload a file in azure storage?

How can I obtain Etag of a file in a storage account using Python SDK for Azure?

Trouble with Google Application Credentials

Categories

Resources