open and Save excel file in S3 using Python - python

I have some problem with excel(xlsx) file.I want to just open and save operation using python code.I have tried with python but couldn't found
cursor = context.cursor()
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket')
objects = bucket.objects.all()
for obj in objects:
if obj.key.startswith('path/filename'):
filename=obj.key
openok=open(obj)
readok = openok.readlines()
readok.close()
print ('file open and close sucessfully')```

You can't read/interact with files directly on s3 as far as I know.
I'd recommend downloading it locally, and then opening it. You can use the builtin tempfile module if you want to save it to a temporary path.
with tempfile.TemporaryDirectory() as tmpdir:
local_file_path = os.path.join(tmpdir, "tmpfile")
bucket.download_file(obj.key, local_file_path)
openok=open(local_file_path)
readok = openok.readlines()
readok.close()

Related

Uploading files from sftp to an s3 bucket [duplicate]

I am using Paramiko to access a remote SFTP folder, and I'm trying to write code that transfers files from a path in SFTP (with a simple logic using the file metadata to check it's last modified date) to AWS S3 bucket.
I have set the connection to S3 using Boto3, but I still can't seem to write a working code that transfers the files without downloading them to a local directory first. Here is some code I tried using Paramiko's getfo() method. But it doesn't work.
for f in files:
# get last modified from file metadata
last_modified = sftp.stat(remote_path + f).st_mtime
last_modified_date = datetime.fromtimestamp(last_modified).date()
if last_modified_date > date_limit: # check limit
print('getting ' + f)
full_path = f"{folder_path}{f}"
fo = sftp.getfo(remote_path + f,f)
s3_conn.put_object(Body=fo,Bucket=s3_bucket, Key=full_path)
Thank you!
Use Paramiko SFTPClient.open to get a file-like object that you can pass to Boto3 Client.put_object:
with sftp.open(remote_path + f, "r") as f:
f.prefetch()
s3_conn.put_object(Body=f)
For the purpose of the f.prefetch(), see Reading file opened with Python Paramiko SFTPClient.open method is slow.
For the opposite direction, see:
Transfer file from AWS S3 to SFTP using Boto 3

How can i read file pdf in AWS S3 with boto3 in python?

I would like to read .pdf files in S3 bucket, but the problem is that it returns formatted bytes,
Whereas if the file is in .csv or .txt this code works
What's wrong with .pdf files?
the code :
import boto3
s3client = boto3.client('s3')
fileobj = s3client.get_object(
Bucket=BUCKET_NAME,
Key='file.pdf'
)
filedata = fileobj['Body'].read()
contents = filedata
print(contents)
it returns :
b'%PDF-1.4\n%\xd3\xeb\xe9\xe1\n1 0 obj\n<</Title (Architecture technique)\n/Producer (Skia/PDF m99 Google Docs Renderer)>>\nendobj\n3 0 obj\n<</ca 1\n/BM /Normal>>\nendobj\n6 0 obj\n<</Type /XObject\n/Subtype /Image\n/Width 1424\n/Height 500\n/ColorSpace /DeviceRGB\n/SMask 7 0 R\n/BitsPerComponent 8\n/Filter /FlateDecode\n/Length 26885>> stream\nx\x9c\xed\xdd\xeb\x93$Y\x99\xe7\xf7'
another solution that i try but not work too:
import boto3
from PyPDF2 import PdfFileReader
from io import BytesIO
s3 = boto3.resource('s3')
obj = s3.Object(BUCKET_NAME,'file.pdf')
fs = obj.get()['Body'].read()
pdfFile = PdfFileReader(BytesIO(fs))
it's return :
<PyPDF2.pdf.PdfFileReader at 0x7efbc8aead00>
Start by writing some Python code to access a PDF file on your local disk (search for a Python PDF library on the web).
Once you have that working, then you can look at reading the file from Amazon S3.
When reading a file from S3, you have two options:
Use fileobj['Body'].read() (as you already are doing) to obtain the bytes from the file directly, or
Use download_file() to download the file from S3 to the local disk, then process the file from disk
Which method to choose will depend upon the PDF library that you choose to use.

Can you temporarily copy a Google Cloud image file in Python?

For a project, I'm trying to get an uploaded image file, stored in a bucket. I'm trying to have Python save a copy temporarily, just to perform a few tasks on this file (read, decode and give the decoded file back as JSON). After this is done, the temp file needs to be deleted.
I'm using Python 3.8, if that helps at all.
If you want some snippets of what I tried, I'm happy to provide :)
#edit
So far, I tried just downloading the file from the bucket, which works. But I can't seem to figure out how to temporarily save it to just decode (I got an API that will decode the image and get data from that file). This is the code for downloading
def download_file_from_bucket(blob_name, file_path, bucket_name):
try:
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob(blob_name)
with open(file_path, 'wb') as f:
storage_client.download_blob_to_file(blob, f)
except Exception as e:
print(e)
return False
bucket_name = 'white-cards-with-qr'
download_file_from_bucket('My first Blob Image', os.path.join(os.getcwd(), 'file2.jpg'), bucket_name)
for object store in cloud environment, you can sign your object to give access for ones who don't have account for that object, you may read this for google cloud
You can use the tempfile library. This is a really basic snippet. You can also name the file or read it after writing it.
import tempfile
temp = tempfile.TemporaryFile()
try:
temp.write(blob)
finally:
temp.close()

how to save file as zip without saving it to local folder

I'm trying to create a download function for my streamlit app. But what I currently have allows me to download a zip file via a button on my streamlit app but unfortunately it also saves it to my local folder. I don't want it to save to my local folder. The problem is when I initialize the file_zip object. I want the zip file in a specific name ideally the same name of the file that the user upload with a '.zip' extension (i.e datafile that contains the string file name as a parameter in the function). But everytime I do that it keeps saving the zip file in my local folder. Is there an alternative to this? BTW I'm trying to save list of pandas dataframe into one zip file.
def downloader(list_df, datafile, file_type):
file = datafile.name.split(".")[0]
#create zip file
with zipfile.ZipFile("{}.zip".format(file), 'w', zipfile.ZIP_DEFLATED) as file_zip:
for i in range(len(list_df)):
file_zip.writestr(file+"_group_{}".format(i)+".csv", pd.DataFrame(list_df[i]).to_csv())
file_zip.close()
#pass it to front end for download
zip_name = "{}.zip".format(file)
with open(zip_name, "rb") as f:
bytes=f.read()
b64 = base64.b64encode(bytes).decode()
href = f'Click Here To Download'
st.markdown(href, unsafe_allow_html=True)
It sounds like you want to create the zip file in memory and use it later to build a base64 encoding. You can use an io.BytesIO() object with ZipFile, rewind it, and read the data back for base64 encoding.
import io
def downloader(list_df, datafile, file_type):
file = datafile.name.split(".")[0]
#create zip file
zip_buf = io.BytesIO()
with zipfile.ZipFile(zip_buf, 'w', zipfile.ZIP_DEFLATED) as file_zip:
for i in range(len(list_df)):
file_zip.writestr(file+"_group_{}".format(i)+".csv", pd.DataFrame(list_df[i]).to_csv())
zip_buf.seek(0)
#pass it to front end for download
zip_name = "{}.zip".format(file)
b64 = base64.b64encode(zip_buf.read()).decode()
del zip_buf
href = f'Click Here To download'
st.markdown(href, unsafe_allow_html=True)

Fetch a remote zip file and list the files within

I'm working on a small Google App Engine project in which I need to fetch a remote zip file from a URL and then list the files contained in the zip archive.
I'm using the zipfile module.
Here's what I've come up with so far:
# fetch the zip file from its remote URL
result = urlfetch.fetch(zip_url)
# store the contents in a stream
file_stream = StringIO.StringIO(result.content)
# create the ZipFile object
zip_file = zipfile.ZipFile(file_stream, 'w')
# read the files by name
archive_files = zip_file.namelist()
Unfortunately the archive_files list is always of length 0.
Any ideas what I'm doing wrong?
You are opening the file with w permissions, which truncates it. Change it to r permissions for reading:
zip_file = zipfile.ZipFile(file_stream, 'r')
Reference: http://docs.python.org/library/zipfile.html#zipfile-objects
You're opening the ZipFile for writing. Try reading instead.

Categories

Resources