copy files from bitbucket repository to S3 bucket using python script - python

I am running a python script (through teamcity) which deploys AWS cloud formation templates. As a pre-requisite all the templates and script files in the bit bucket repository should be copied to S3 bucket before the deployment. Bit bucket directory structure should be maintained in S3 also. How can python script copy files and directories from bitbucket to S3.
I was using os.walk method to accomplish it, but it's not copying the subdirectories and files in subdirectories.
pwd=os.getcwd()
print("Current Dir is: "+pwd)
print("Files in the Directory: ")
for subdir, dirs, files in os.walk(pwd):
for file in files:
print (file)
try:
object_name = file
print("object name is"+object_name)
response = client_bucket.upload_file(file, bucket_name, object_name)
except:
print("Error processing account " + account_id)
for line in str(traceback.format_exc()).splitlines():
print(line)

Related

generate zip folder and add to storage django

Hi guys I need to generate a zip file from a bunch of images and add it as a file to the database
This is the code to generate the file
def create_zip_folder(asin):
print('creating zip folder for asin: ' + asin)
asin_object = Asin.objects.get(asin=asin)
# create folder
output_dir = f"/tmp/{asin}"
if not os.path.exists(output_dir):
os.makedirs(output_dir)
# download images
for img in asin_object.asin_images.all():
urllib.request.urlretrieve(img.url, f"{output_dir}/{img.id}.jpg")
# zip all files in output_dir
zip_file = shutil.make_archive(asin, 'zip', output_dir)
asin_object.zip_file = zip_file
asin_object.has_zip = True
asin_object.save()
# delete folder
shutil.rmtree(output_dir)
return True
This all works and I can see the files generated in my editor but when I try to access it in the template asin.zip_file.url I get this error
SuspiciousOperation at /history/
Attempted access to '/workspace/B08MFR2DRS.zip' denied.
Why is this happening? I thought the file is to be uploaded to the file storage through the model but apparently it's in a restricted folder, this happens both in development (with local file storage) and in production (with s3 bucket as file storage)

Automating running Python code using Azure services

Hi everyone on Stackoverflow,
I wrote two python scripts. One script is for picking up local files and sending them to GCS (Google Cloud Storage). Another one is opposite - for taking files from GCS that were uploaded and saving locally.
I want to automate process using Azure.
What would you recommend to use? Azure Function App, Azure Logic App or other services?
*
I'm now trying to use Logic App. I made .exe file using pyinstaller and looking for connector in Logic App that will run my program (.exe file). I have trigger in Logic App - "When a file is added or modified", but now I stack when selecting next step (connector)..
Kind regards,
Anna
Adding code as requested:
from google.cloud import storage
import os
import glob
import json
# Finding path to config file that is called "gcs_config.json" in directory C:/
def find_config(name, path):
for root, dirs, files in os.walk(path):
if name in files:
return os.path.join(root, name)
def upload_files(config_file):
# Reading 3 Parameters for upload from JSON file
with open(config_file, "r") as file:
contents = json.loads(file.read())
print(contents)
# Setting up login credentials
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = contents['login_credentials']
# The ID of GCS bucket
bucket_name = contents['bucket_name']
# Setting path to files
LOCAL_PATH = contents['folder_from']
for source_file_name in glob.glob(LOCAL_PATH + '/**'):
# For multiple files upload
# Setting destination folder according to file name
if os.path.isfile(source_file_name):
partitioned_file_name = os.path.split(source_file_name)[-1].partition("-")
file_type_name = partitioned_file_name[0]
# Setting folder where files will be uploaded
destination_blob_name = file_type_name + "/" + os.path.split(source_file_name)[-1]
# Setting up required variables for GCS
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
# Running upload and printing confirmation message
blob.upload_from_filename(source_file_name)
print("File from {} uploaded to {} in bucket {}.".format(
source_file_name, destination_blob_name, bucket_name
))
config_file = find_config("gcs_config.json", "C:/")
upload_files(config_file)
config.json:
{
"login_credentials": "C:/Users/AS/Downloads/bright-velocity-___-53840b2f9bb4.json",
"bucket_name": "staging.bright-velocity-___.appspot.com",
"folder_from": "C:/Users/AS/Documents/Test2/",
"folder_for_downloaded_files": "C:/Users/AnnaShepilova/Documents/DownloadedFromGCS2/",
"given_date": "",
"given_prefix": ["Customer", "Account"] }
Currently, there is no built-in connector in Logic Apps for interacting with Google Cloud Services. however, you can use Google Cloud Storage does provide REST API in your Logic app or Function app.
But my suggestion is you can use the Azure Function to do these things. Because the azure Function can be more flexible to write your own flow to do the task.
Refer to run your .exe file in the Azure function. If you are using Local EXE or using Cloud Environment exe.
Refer here for more information

How Iterate in a folder on s3 by using boto3?

In my instance in the s3 I have a folder with N files, I need to iterate in this using this script bellow, I need to get all files and convert it, this script is hosted on a ec2 instance working with django.
I tryed a lot, by using the boto3 function get_object but all I get is nothing.
Can someone tell please how can I do something like that?, I'll need to download this files before convert or can I do it directly?
def upload_folder_to_s3(local_folder, destination_folder, s3_bucket):
'''
Function to upload a specific local folder to S3 bucket.
Parameters:
local_folder (str): Path to local folder.
destination_folder (str): Path to destination folder on S3.
s3_bucket (str): Bucket name on S3.
Return:
'''
# Global variables
global client
# Iterate over files on folder
for root, dirs, files in os.walk(local_folder):
for filename in files:
print(filename)
# construct the full local path
local_path = os.path.join(root, filename)
# construct the full Dropbox path
relative_path = os.path.relpath(local_path, local_folder)
s3_path = os.path.join(destination_folder, relative_path)
# relative_path = os.path.relpath(os.path.join(root, filename))
print('Searching "%s" in "%s"' % (s3_path, s3_bucket))
try:
client.get_object(Bucket=s3_bucket, Key=s3_path)
print("Path found on S3! Skipping %s..." % s3_path)
# try:
# client.delete_object(Bucket=bucket, Key=s3_path)
# except:
# print "Unable to delete %s..." % s3_path
except:
print("Uploading %s..." % s3_path)
client.upload_file(local_path, s3_bucket, s3_path)
return local_folder

Download multiple file from Google cloud storage using Python

I am trying to download multiple files from the Google cloud storage folder. I am able to download the single file but unable to download multiple files. I took this reference from this link but seems it is not working.
The code is as follow:
# [download multiple files]
bucket_name = 'bigquery-hive-load'
# The "folder" where the files you want to download are
folder="/projects/bigquery/download/shakespeare/"
# Create this folder locally
if not os.path.exists(folder):
os.makedirs(folder)
# Retrieve all blobs with a prefix matching the folder
bucket=storage_client.get_bucket(bucket_name)
print(bucket)
blobs=list(bucket.list_blobs(prefix=folder))
print(blobs)
for blob in blobs:
if(not blob.name.endswith("/")):
blob.download_to_filename(blob.name)
# [End download to multiple files]
Is there any way to download multiple files matching with the pattern(name) or something else. Since I am exporting the file from bigquery, the file names will be something like below:
shakespeare-000000000000.csv.gz
shakespeare-000000000001.csv.gz
shakespeare-000000000002.csv.gz
shakespeare-000000000003.csv.gz
Reference: Working code to download single file:
# [download to single files]
edgenode_destination_uri = '/projects/bigquery/download/shakespeare-000000000000.csv.gz'
bucket_name = 'bigquery-hive-load'
gcs_bucket = storage_client.get_bucket(bucket_name)
blob = gcs_bucket.blob("shakespeare.csv.gz")
blob.download_to_filename(edgenode_destination_uri)
logging.info('Downloded {} to {}'.format(
gcs_bucket, edgenode_destination_uri))
# [end download to single files]
After some trial, I solved this and couldn't stop myself from posting here as well.
bucket_name = 'mybucket'
folder='/projects/bigquery/download/shakespeare/'
delimiter='/'
file = 'shakespeare'
# Retrieve all blobs with a prefix matching the file.
bucket=storage_client.get_bucket(bucket_name)
# List blobs iterate in folder
blobs=bucket.list_blobs(prefix=file, delimiter=delimiter) # Excluding folder inside bucket
for blob in blobs:
print(blob.name)
destination_uri = '{}/{}'.format(folder, blob.name)
blob.download_to_filename(destination_uri)
It looks like you may simply have the wrong level of indentation in your python code. The block beginning with # Retrieve all blobs with a prefix matching the folder is within the scope of the if above so it's never executed if the folder already exists.
Try this:
# [download multiple files]
bucket_name = 'bigquery-hive-load'
# The "folder" where the files you want to download are
folder="/projects/bigquery/download/shakespeare/"
# Create this folder locally
if not os.path.exists(folder):
os.makedirs(folder)
# Retrieve all blobs with a prefix matching the folder
bucket=storage_client.get_bucket(bucket_name)
print(bucket)
blobs=list(bucket.list_blobs(prefix=folder))
print(blobs)
for blob in blobs:
if(not blob.name.endswith("/")):
blob.download_to_filename(blob.name)
# [End download to multiple files]

how to upload sub-folder which is empty to S3 using python

The following code works fine except if there is a subfolder, which does not have any file inside, then the subfolder will not appear in S3. e.g.
if /home/temp/subfolder has no file, then subfolder will not show in S3. how to change the code so that the empty folder is also uploaded in S3?
I tried to write sth. (see note below), but do not know how to call put_object() to the empty subfolder.
#!/usr/bin/env python
import os
from boto3.session import Session
path = "/home/temp"
session = Session(aws_access_key_id='XXX', aws_secret_access_key='XXX')
s3 = session.resource('s3')
for subdir, dirs, files in os.walk(path):
# note: if not files ......
for file in files:
full_path = os.path.join(subdir, file)
with open(full_path, 'rb') as data:
s3.Bucket('my_bucket').put_object(Key=full_path[len(path)+1:],
Body=data)
besides, I tried to call this function to check if a subfolder or file exist or not. it works for file, but not subfolder. how to check if a subfolder exists or not? (if a subfolder exists, I will not upload)
def check_exist(s3, bucket, key):
try:
s3.Object(bucket, key).load()
except botocore.exceptions.ClientError as e:
return False
return True
BTW, I refer the above code from
check if a key exists in a bucket in s3 using boto3
and
http://www.developerfiles.com/upload-files-to-s3-with-python-keeping-the-original-folder-structure/
thanks them for sharing the code.
Directories (folders, subfolders, etc.) do not exist in S3.
When you copy this file to an empty S3 bucket /mydir/myfile.txt, only the file myfile.txt is copied to S3. The directory mydir is not created as that string is part of the file name mydir/myfile.txt. The actual file name is the full path, no subdirectories exist or are created.
S3 simulates directories by using a prefix when listing files in the bucket. If you specify mydir/, then all of the S3 objects that start with mydir/ will be returned including objects such as mydir/anotherfolder/myotherfile.txt. S3 supports a delimitor such as / so that the appearance of subdirectories can be created.
Note: There is no / at the beginning of a file name for S3 objects.
Listing Keys Hierarchically Using a Prefix and Delimiter

Categories

Resources