python azure check if blob is saved to archive

python azure check if blob is saved to archive - python

I would like to know if there exist a method to check if a blob(task) is successfully saved within azure and if there was a way to get returned something if save was successful.
The current code I have uploads to azure using blobserviceclient, and what I am trying to do is deleting tasks that was uploaded in azure and not deleting the ones that were was not saved in archive.
what I have tried to implement is this, however by doing so it checks if the file exist within azure instead of checking and getting returned something if upload was successful within azure.
async def if_uploaded(works):
container_name = "works"
try:
block_blob_service = BlockBlobService(account_name=accountName, account_key=accountKey,
socket_timeout=10000)
isExist = block_blob_service.exists(container_name, filename)
if isExist:
return True
else:
return False
except Exception as e:
print(e)
The code for upload is
async def upload(works):

I would like to know if there exist a method to check if a blob(task)
is successfully saved within azure and if there was a way to get
returned something if save was successful.
If you do not get any error while uploading that would mean the blob got uploaded successfully. You don't really need to do anything else to check if the upload operation was successful or not.
To check if the blob upload was successful, just wrap your upload code in try/except block. Something like:
try:
await upload_file(blob_service_client, container_name, file_name, content)
# blob upload successful...do something here...
except:
# blob upload failed...do something here...
print("An exception occurred")

Related

How to download/save file to blob starage via azure app functions?

I'm new to Azure. I need http triggered function to perform some simple actions on my blob storage. This will be the part of pipeline in datafactory but first I need to figure it out how to edit blobs via functions. I'm stucked now because I have no idea which API/methods I could use. Appreciate for your help. Below is some part of my code.
def main(req):
logging.info('Python HTTP trigger function processed a request.')
name = req.params.get('name')
if not name:
try:
req_body = req.get_json()
except ValueError:
pass
else:
name = req_body.get('name')
if name:
requested_file = requests.get("web address")
### Should I connect to blob here?
with open("file.txt", "wb") as file:
file.write(requested_file.content)
return func.HttpResponse(f"Hello, {name}. This HTTP triggered function executed successfully.")
else:
return func.HttpResponse(
"This HTTP triggered function executed successfully. Pass a name in the query string or in the request body for a personalized response.",
status_code=200
)

You can use either the Python SDK
Or use bindings and triggers for Azure Functions
How you would do it is use the bindings to pull in your blob and then use the bindings to write out your blob. This is actually a pretty good example of it:
Similarly with the SDK, you want to make sure that you are calling in and writing out. Make sure all your keys are correct and your containers are too!

Download Files From AWS S3 Without Access and Secret Keys in Python

I have a working code to download files from one of my buckets in S3 and does some conversion work through in Python. I do not embed the Access and Secret Keys in the code but the keys are in my AWS CLI configuration.
import boto3
import botocore
BUCKET_NAME = 'converted-parquet-bucket' # replace with your own bucket name
KEY = 'json-to-parquet/names.snappy.parquet' # replace with path and follow with key object
s3 = boto3.resource('s3')
try:
s3.Bucket(BUCKET_NAME).download_file(KEY, 'names.snappy.parquet') # replace the key object name
except botocore.exceptions.ClientError as e: # exception handling
if e.response['Error']['Code'] == "404":
print("The object does not exist.") # if object that you are looking for does not exist it will print this
else:
raise
# Un comment lines 21 and 22 to convert csv to parquet
# dataframe = pandas.read_csv('names.csv')
# dataframe.to_parquet('names.snappy.parquet' ,engine='auto', compression='snappy')
data = pq.read_pandas('names.snappy.parquet', columns=['Year of Birth', 'Gender', 'Ethnicity', "Child's First Name", 'Count', 'Rank']).to_pandas()
#print(data) # this code will print the ALL the data in the parquet file
print(data.loc[data['Gender'] == 'MALE']) # this code will print the data in the parquet file ONLY what is in the query (SQL query)
Could someone help me how to get this code working without having access and secret keys embedded in the code or in AWS configure

If you are running your function locally, you need to have your credentials on your local credentials/config file to interact with AWS resources.
One alternative would be to run on AWS Lambda (if your function runs periodically, you can set that up with CloudWatch Events) and use Environment Variables or AWS Security Token Service (STS) to generate temporary credentials.

If you do not want to use secret/access key, you should use roles and policies, then. Here's the deal:
Define a role (ex. RoleWithAccess) and be sure that your user (defined in your credentials) can assume this role
Set a policy for RoleWithAccess, giving read/write access to your buckets
If you are executing it in your local machine, run the necessary commands (AWS CLI) to create a profile that makes you assume RoleWithAccess (ex. ProfileWithAccess)
Execute your script using a session passing this profile as the argument, what means you need to replace:
s3 = boto3.resource('s3')
with
session = boto3.session.Session(profile_name='ProfileWithAccess')
s3 = session.resource('s3')
The upside of this approach is that if you are running it inside an EC2 instance, you can tie your instance to a specific role when you build it (ex. RoleWithAccess). In that case, you can completely ignore session, profile, all the AWS CLI hocus pocus, and just run s3 = boto3.resource('s3').
You can also use AWS Lambda, setting a role and a policy with read/write permission to your bucket.

Uploading csv file using python to azure blob storage

I'm trying to upload a csv file to a container. It is constantly giving me an error that says - Retry policy did not allow for a retry: , HTTP status code=Unknown, Exception=HTTPSConnectionPool
Here is my code -
from azure.storage.blob import BlockBlobService
block_blob_service = BlockBlobService(account_name='myAccoutName', account_key='myAccountKey')
block_blob_service.get_blob_to_path(container_name='test1', blob_name='pho.csv', file_path = 'C:\\Users\\A9Q5NZZ\\pho.csv')
I am new to Python so if you can answer with a simple language, that would be really helpful.
Forget uploading a CSV file, it doesn't even let me view existing blobs in an existing container! It gives the same 'Retry Policy' error for the below code -
container_name = 'test1'
generator = block_blob_service.list_blobs(container_name)
for blob in generator:
print("\t Blob name: " + blob.name)
I understand I've asked two questions, but I think the error is the same. Any help is appreciated. Again, since I am new to Python, an explanation/code with simpler terms would be great!

The method get_blob_to_path you're using is for downloading blob to local. If you want to upload a local file to azure blob storage, you should use this method block_blob_service.create_blob_from_path(container_name="",blob_name="",file_path="")
The sample code works at my side:
from azure.storage.blob import BlockBlobService
block_blob_service = BlockBlobService(account_name='xxx', account_key='xxxx')
block_blob_service.create_blob_from_path(container_name="mycontainier",blob_name="test2.csv",file_path="D:\\temp\\test2.csv")

Downloading a file from a requester pays bucket in amazon s3

I need to download some replay files from an API that has the files stored on an amazon s3 bucket, with requester pays enabled.
The problem is, I set up my amazon AWS account, created an AWSAccessKeyId and AWSSecretKey, but I still can't get to download a single file, since I'm getting an Access denied response.
I want to automate all this inside a Python script, so I've been trying to do this with the boto3 package. Also, I installed the Amazon AWS CLI, and set up my access ID and secret key.
The file I've been trying to download (I want to download multiple ones, but for now I'm trying with just one) is this: http://hotsapi.s3-website-eu-west-1.amazonaws.com/18e8b4df-6dad-e1f5-bfc7-48899e6e6a16.StormReplay
From what I've found so far on SO, I've tried something like this:
import boto3
import botocore
BUCKET_NAME = 'hotsapi' # replace with your bucket name
KEY = '18e8b4df-6dad-e1f5-bfc7-48899e6e6a16.StormReplay' # replace with your object key
s3 = boto3.resource('s3')
try:
s3.Bucket(BUCKET_NAME).download_file(KEY, 'test.StormReplay')
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
And this:
import boto3
s3_client = boto3.Session().client('s3')
response = s3_client.get_object(Bucket='hotsapi',
Key='18e8b4df-6dad-e1f5-bfc7-48899e6e6a16.StormReplay',
RequestPayer='requester')
response_content = response['Body'].read()
with open('./B01.StormReplay', 'wb') as file:
file.write(response_content)
But I still can't manage to download the file.
Any help is welcome! Thanks!

Uploading large files to Google Storage GCE from a Kubernetes pod

We get this error when uploading a large file (more than 10Mb but less than 100Mb):
403 POST https://www.googleapis.com/upload/storage/v1/b/dm-scrapes/o?uploadType=resumable: ('Response headers must contain header', 'location')
Or this error when the file is more than 5Mb
403 POST https://www.googleapis.com/upload/storage/v1/b/dm-scrapes/o?uploadType=multipart: ('Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>)
It seems that this API is looking at the file size and trying to upload it via multi part or resumable method. I can't imagine that is something that as a caller of this API I should be concerned with. Is the problem somehow related to permissions? Does the bucket need special permission do it can accept multipart or resumable upload.
from google.cloud import storage
try:
client = storage.Client()
bucket = client.get_bucket('my-bucket')
blob = bucket.blob('blob-name')
blob.upload_from_filename(zip_path, content_type='application/gzip')
except Exception as e:
print(f'Error in uploading {zip_path}')
print(e)
We run this inside a Kubernetes pod so the permissions get picked up by storage.Client() call automatically.
We already tried these:
Can't upload with gsutil because the container is Python 3 and gsutil does not run in python 3.
Tried this example: but runs into the same error: ('Response headers must contain header', 'location')
There is also this library. But it is basically alpha quality with little activity and no commits for a year.
Upgraded to google-cloud-storage==1.13.0
Thanks in advance

The problem was indeed the credentials. Somehow the error message was very miss-leading. When we loaded the credentials explicitly the problem went away.
# Explicitly use service account credentials by specifying the private key file.
storage_client = storage.Client.from_service_account_json(
'service_account.json')

I found my node pools had been spec'd with
oauthScopes:
- https://www.googleapis.com/auth/devstorage.read_only
and changing it to
oauthScopes:
- https://www.googleapis.com/auth/devstorage.full_control
fixed the error. As described in this issue the problem is an uninformative error message.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python azure check if blob is saved to archive - python

Related

How to download/save file to blob starage via azure app functions?

Download Files From AWS S3 Without Access and Secret Keys in Python

Uploading csv file using python to azure blob storage

Downloading a file from a requester pays bucket in amazon s3

Uploading large files to Google Storage GCE from a Kubernetes pod

Categories

Resources