Access Datalake from local python securely

Access Datalake from local python securely - python

I would like to access the contents in my Azure Datalake gen 2 via my local python editor? What would be the best way to do this?
I googled, but there are multiple ways to do this - SAS, Service principle for instance.
Could someone please provide any pointers in the right direction?
Thank you.

Use this method to link:
https://learn.microsoft.com/en-us/python/api/azure-storage-file-datalake/azure.storage.filedatalake.datalakeserviceclient?view=azure-python
And use this to create credential:
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
And for any one you want it to access, you need to add access role(RBAC, people without corresponding permissions cannot perform specific operations.):
You can try above steps.

Related

Azure SDK for Python: Copy blobs

For my current Python project I' using the Microsoft Azure SDK for Python.
I want to copy a specific blob from one container path to another and tested already some options, described here.
Overall they are basically "working", but unfortunately the new_blob.start_copy_from_url(source_blob_url) command always leads to an erorr: ErrorCode:CannotVerifyCopySource.
Is someone getting the same error message here, or has an idea, how to solve it?
I was also trying to modify the source_blob_url as a sas-token, but still doesn't work. I have the feeling that there is some connection to the access levels of the storage account, but so far I wasn't able to figure it out. Hopefully someone here can help me.

Is someone getting the same error message here, or has an idea, how to solve it?
As you have mentioned you might be receiving this error due to permissions while including the SAS Token.
The difference to my code was, that I used the blob storage sas_token from the Azure website, instead of generating it directly for the blob client with the azure function.
In order to allow access to certain areas of your storage account, a SAS is generated by default with a number of permissions such as read/write, services, resource type, Start and expiration date/time, and Allowed IP addresses, etc.
It's not that you always need to generate directly for the blob client with the azure function but you can generate one from the portal too by allowing the permissions.
REFERENCES: Grant limited access to Azure Storage resources using SAS - MSFT Document

Connect google storage client python without creds.json

now to define Google storage client I'm using:
client = storage.Client.from_service_account_json('creds.json')
But I need to change client dynamically and prefer not deal with storing auth files to local fs.
So, is there some another way to connect by sending credentials as variable?
Something like for AWS and boto3:
iam_client = boto3.client(
'iam',
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY
)
I guess, I miss something in docs and would be happy if someone point me where can I found this in docs.

If you want to use built-in methods, an option could be to create the constructor for the Client (Cloud Storage). In order to perform that actions these two links can be helpful.
Another possible option in order to avoid store auth files locally is using environment variable pointing to credentials outside of your applications code such as Cloud Key Management Service. To have more context about this you can take a look at this article.

"StartAfter" parameter equivalent in Google Cloud Storage in listing objects

I'm quite new to GCP, and now struggling to list out files after a given key.
In AWS, we can provide an additional parameter StartAfter to the list_objects_v2() boto3 call for S3 client. Then, it will start providing files starting from that particular key.
kwargs["StartAfter"] = start_after_file
response = self._storage_client.list_objects_v2(
Bucket=self._bucket_name,
Prefix=prefix,
**kwargs
)
I need to do the same in GCP, using the Google Cloud Storage (in Python). I'm going to use list_blobs() in Storage Client class, but I can't find any way to do this.
prefix parameter won't help since it will only return the files with that prefix.
Does anyone know how I can achieve this?

According to the documentation for this library, there is no method to achieve this directly, you would need to filter the response in your code.
Nevertheless, you can open a feature request for this to be revised, or you can contact the team responsible for the library over at their GitHub
Hope you find this useful.

azure virtual machine operations

I am new to azure.I am learning azure python sdk and have some doubts.
I am not using any credentials to log in azure account and still can access
VM's in subscription in my code below, how?
I am trying to get list of all VM's using list_all() which is present in azure doc https://learn.microsoft.com/en-us/python/api/azure-mgmt-compute/azure.mgmt.compute.v2018_10_01.operations.virtualmachinesoperations?view=azure-python#list-all-custom-headers-none--raw-false----operation-config-
, How can i get list of VM's or how to iterate over VirtualMachinePaged object return by list_all() to get list of VM's?
When i tried to print name of VM using #print(client.virtual_machines.get(resource_group_name='GSLab', vm_name='GSLabVM2')) i got error Resource group 'GSLab' could not be found.
, i checked and sure that name of resource group in 'GSLab', so why am i getting this error?
Here is my code, Thank you and please suggest any other source for better understanding of these concepts if possible.
from azure.common.client_factory import get_client_from_auth_file
from azure.mgmt.compute import ComputeManagementClient
client = get_client_from_auth_file(ComputeManagementClient)
#print(client)
vmlist = client.virtual_machines.list_all()
print(vmlist)
for vm in vmlist:
print(vm.name)
print(client.virtual_machines.get(resource_group_name='GSLab', vm_name='GSLabVM2'))

Q1: You get the credentials from the Authentication file that you set and the service principal is in it.
Q2: You just need to delete the print(vmlist) and then everything is OK.
Q3:
The code:
client.virtual_machines.get(resource_group_name='GSLab', vm_name='GSLabVM2')
The result will like this:
So you need to check that if the resource group 'GSLab' really exist in the subscription you have set in the Authentication file.

vmlist = client.virtual_machines.list_all()
for vm in vmlist:
print(vm.name)
this code is correct and this one as well:
client.virtual_machines.get(resource_group_name='GSLab', vm_name='GSLabVM2')
if they both return nothing you authenticated to the wrong subscription, you need to auth to the proper subscription.
simple way to check you got some output:
vmlist.next().name

Use stored credentials in Python Amazon Simple Product API

I would like to use the credentials stored in ~/.amazon-product-api to make requests. The simple-product-api tells me to do it like this:
from amazon.api import AmazonAPI
amazon = AmazonAPI(AMAZON_ACCESS_KEY, AMAZON_SECRET_KEY, AMAZON_ASSOC_TAG)
But I don't want to put my credentials in the code. Unfortunately python-amazon-product-api doesn't support Python3 yet, otherwise I could have done what is suggested on this page: https://python-amazon-product-api.readthedocs.io/en/latest/basic-usage.html
Does anyone have a way to use the stored credentials, or a way to not include the credentials in the code?

python-amazon-product-api is tested on Python 3.5.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Access Datalake from local python securely - python

I would like to access the contents in my Azure Datalake gen 2 via my local python editor? What would be the best way to do this? I googled, but there are multiple ways to do this - SAS, Service principle for instance. Could someone please provide any pointers in the right direction? Thank you.

Related

Azure SDK for Python: Copy blobs

Connect google storage client python without creds.json

"StartAfter" parameter equivalent in Google Cloud Storage in listing objects

azure virtual machine operations

Use stored credentials in Python Amazon Simple Product API

Categories

Resources