Data is in MS Access and it's in one of the shared drive on the network. I need this data in azure blob storage as CSV files. Can anyone please suggest me how can this be possible?
You can move data to Azure Blob storage in several ways, You could use either Azcopy: located here: https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10 , Or Storage Explorer(GUI): https://azure.microsoft.com/en-us/features/storage-explorer/
OR using Python SDK:
block_blob_service.create_blob_from_path(container, file, file)
Python SDK can be found here: https://github.com/Azure/azure-sdk-for-python
When it comes to changing the format from Access to CSV, it's something not related to Azure Storage, you can try existing libraries for that conversion, then upload to blob storage.
Related
I want to migrate files from Digital Ocean Storage into Google Cloud Storage programatically without rclone.
I know the exact location file that resides in the Digital Ocean Storage(DOS), and I have the signed url for the Google Cloud Storage(GCS).
How can I modify the following code so I can copy the DOS file directly into GCS without intermediate download to my computer ?
def upload_to_gcs_bucket(blob_name, path_to_file, bucket_name):
""" Upload data to a bucket"""
# Explicitly use service account credentials by specifying the private key
# file.
storage_client = storage.Client.from_service_account_json(
'creds.json')
#print(buckets = list(storage_client.list_buckets())
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(blob_name)
blob.upload_from_filename(path_to_file)
#returns a public url
return blob.public_url
Google's Storage Transfer Servivce should be an answer for this type of problem (particularly because DigitalOcean Spaces like most is S3-compatible. But (!) I think (I'm unfamiliar with it and unsure) it can't be used for this configuration.
There is no way to transfer files from a source to a destination without some form of intermediate transfer but what you can do is use memory rather than using file storage as the intermediary. Memory is generally more constrained than file storage and if you wish to run multiple transfers concurrently, each will consume some amount of storage.
It's curious that you're using Signed URLs. Generally Signed URLs are provided by a 3rd-party to limit access to 3rd-party buckets. If you own the destination bucket, then it will be easier to use Google Cloud Storage buckets directly from one of Google's client libraries, such as Python Client Library.
The Python examples include uploading from file and from memory. It will likely be best to stream the files into Cloud Storage if you'd prefer to not create intermediate files. Here's a Python example
In AWS I have folder format like eg : Bucketname/Data/files/abc_01-02-2022.csv
In a increment order I have files for each dates for all the months in year.
In Google Cloud Storage I am trying to create folder structure like eg:Bucketname/data/202202/files/abc_01-02-2022.csv for whole year
So, I am trying to use storage transfer service which will take dynamically or from object itself and create a folder structure automatically by getting trigger automatically 2nd of the month.
Can we achieve this by using transfer service.
what is the best way to achieve this I am trying to make it simple as possible
Storage Transfer Service does not support destination object prefixes, the reason behind it is, Storage Transfer Service doesn’t support remapping, that is, you cannot copy the path Bucketname/Data/files/ to Bucketname/data/202202/files
My recommendation would be to first use the Storage Transfer Service to copy everything from one bucket to another and later use any of the available methods to rename the object in the new bucket to Bucketname/data/202202/files.
Also the Cloud Storage Objects are flat namespaces, that is, Cloud Storage does not have folders and sub folders. There are a few documents that you can refer to for more information on this Object name considerations and Folders
This is possible via using STS API. You can specify "path" at the destination bucket.
I have some datasets (27 CSV files, separated by semicolons, summing 150+GB) that get uploaded every week to my Cloud Storage bucket.
Currently, I use the BigQuery console to organize that data manually, declaring the variables and changing the filenames 27 times. The first file replaces the entire previous database, then the other 26 get appended to it. The filenames are always the same.
How can I do it using Python?
Please, check out Cloud Functions functionality. It allows to use python. After the function is deployed, Cron Jobs can be created. Here is related question:
Run a python script on schedule on Google App Engine
Also here is and article which describes, how to load data from Cloud Storage Loading CSV data from Cloud Storage
I am trying to upload files to azure using the SAS URI only. I found ways using C# but I didn't find a solution using python. The only solution I found using python is to input the account name and account key as parameters in blockblobservice. Here is an example Upload image to azure blob storage using python but I am trying to avoid using this solution. Is there a specific way to upload csv files to azure using only the SAS URI ? Thanks for your help :)
If you're using the latest python blob sdk azure-storage-blob 12.4.0, then you can use the code like below(please feel free to modify the code as per your need):
from azure.storage.blob import BlobClient
upload_file_path="d:\\a11.csv"
sas_url="https://xxx.blob.core.windows.net/test5/a11.csv?sastoken"
client = BlobClient.from_blob_url(sas_url)
with open(upload_file_path,'rb') as data:
client.upload_blob(data)
print("**file uploaded**")
Here is the test result:
This might help:
https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python#upload-blobs-to-a-container
Example is shown by using the Python SDK for Azure Storage
Is there any way to take a small data JSON object from memory and upload it to Google BigQuery without using the file system?
We have working code that uploads files to BigQuery. This code no longer works for us because our new script runs on Google App Engine which doesn't have write access to the file system.
We notice that our BigQuery upload configuration has a "MediaFileUpload" option with a data_path. We have been unable to find if there is some way to change the data_path to something in memory - or if there is another method apart from MediaFileUpload that could upload data from memory to BigQuery.
media_body=MediaFileUpload(
data_path,
mimetype='application/octet-stream'))
job = insert_request.execute()
We have seen solutions which recommend uploading data to Google Cloud Storage then referencing that file to upload to BigQuery for when datasets are large. Our dataset is small so this step seems like an unnecessary use of bandwidth, processing and time.
As a result, we are wondering if there is a way to take our small JSON object and upload it directly to BigQuery using Python?