aws lambda is not able to open file in write mode - python

I am trying to open a csv file in write mode using csv writer it works file in local but when i try to do the same in aws lambda it says read only file system. I am sure that I am opening in write binary mode.
Below is the code for reference.
f = csv.writer(open('abc.csv','wb+'))
f.writerow(['botName','botVersion','utteranceString','count','distinctUsers','firstUtteredDate','lastUtteredDate','status'])
below is the error i am getting:
[Errno 30] Read-only file system: 'abc.csv' this is exception
edit 1
above error is fixed by adding /tmp/ in the file path but I am not able to move csv file created in /tmp to s3 bucket
I used below code
s3_u.meta.client.upload_file( '/tmp/'+output_filename, 'codepipelinedev',k)
this is generating empty file in s3 bucket. and it is throwing an error if i test with non existing file.
when I tried the same thing in local, csv files are created with expected data in the files. but while transfering those files I am getting empty files in our S3 bucket.
Appreciate a help in this
Thanks in advance

AWS Lambda functions only have write access to the /tmp folder within the Lambda runtime environment. If you need to modify that file you need to first copy it to /tmp and then modify it there.

Related

Delete google drive files by extension with PyDrive

I'm trying to delete all files with the extension '.pdf' from a google drive folder.
Everything is fine with the API authentication, I can upload the files. The problem is being the delete.
Here I upload
upload_file = 'Test1.pdf'
gfile = drive.CreateFile({'parents': [{'id': '11SsSKYEATgn_VWzSb-8RjRL-VoIxvamC'}]})
gfile.SetContentFile(upload_file)
gfile.Upload()
Here I try to delete
delfile = drive.CreateFile({'parents': [{'id': '11SsSKYEATgn_VWzSb-8RjRL-VoIxvamC'}]})
filedel = "*.pdf"
delfile.SetContentFile(filedel)
delfile.Delete()
Error:
Traceback (most recent call last):
File "C:/Users/linol/Documents/ProjetoRPA-Python/RPA-TESTE.py", line 40, in <module>
delfile.SetContentFile(filedel)
File "C:\Users\linol\Documents\ProjetoRPA-Python\venv\lib\site-packages\pydrive\files.py", line 175, in SetContentFile
self.content = open(filename, 'rb')
OSError: [Errno 22] Invalid argument: '*.pdf'
I believe your goal and your current situation as follows.
You want to delete the files of PDF file in the specific folder.
You want to achieve this using pydrive for python.
You have already been able to get and put values for Google Drive using Drive API.
In this case, I would like to propose the following flow.
Retrieve file list of PDF file from the specific folder.
Delete the files using the file list.
When above flow is reflected to the script, it becomes as follows.
Sample script:
Please modify ### to your folder ID.
# 1. Retrieve file list of PDF file from the specific folder.
fileList = drive.ListFile({'q': "'###' in parents and mimeType='application/pdf'"}).GetList()
# 2. Delete the files using the file list.
for e in fileList:
drive.CreateFile({'id': e['id']}).Trash()
# drive.CreateFile({'id': e['id']}).Delete() # When you use this, the files are completely deleted. Please be careful this.
This sample script retrieves the files using the mimeType. When you want to retrieve the files using the filename, you can also use fileList = drive.ListFile({'q': "'###' in parents and title contains '.pdf'"}).GetList().
IMPORTANT: In this sample script, when Delete() is used, the files are completely deleted from Google Drive. So at first, I would like to recommend to use Trash() instead of Delete() as a test of script. By this, the files are not deleted and moved to the trash box. By this, I thought that you can test the script.
Note:
It seems that PyDrive uses Drive API v2. Please be careful this.
Reference:
PyDrive

how to upload .csv file to azure folder in python

Anyone know to to upload a .csv file to a folder inside a blob container in python?
i'm having difficulty trying to acess the folders inside it.
i have the csv and want to save it inside the blob folder, but it didn't work.
the file is in code, so i dont want to pass the directory where it is.
csv = df.to_csv()
block_blob_service.create_blob_from_path(container_name, 'folder/csv/mycsv/' , csv)
Someone knows how i can save the csv directly to the folder inside the storage folder (folder/csv/mycsv/) in azure?
i got an error stat: path too long for Windows
Reading the documentation of DataFrame.to_csv, I believe csv variable actually contains string type data. If that's the case, then you will need to use create_blob_from_text method.
So your code would be:
csv = df.to_csv()
block_blob_service.create_blob_from_text(container_name, 'folder/csv/mycsv/' , csv)

How do I upload an .xlsx file to an FTP without creating a local file?

I am writing a script to pull xml files from an FTP, turn them into an .xlsx file, and re-upload to a different directory on the same FTP. I want to create the .xlsx file within my script instead of copying the xml data into a template and uploading my local file.
I tried creating a filename for the .xlsx doc, but i realize that i need to save it before i can upload to the FTP. My question is, would it be better to create a temporary folder on the server the script is being run and empty the folder out afterwards? or is there a way to upload the doc without saving it anywhere (preferred)? I will be running the script on a windows server
ftps.cwd(ftpExcelDir)
wbFilename = str(orderID + '.xlsx')
savedFile = saving the file somwhere # this is the part im having trouble with
ftps.storline('STOR ' + wbFilename, savedFile)
With the following code, i can get the .xlsx files to save to the FTP, but i recieve an invalid extension/corrupt file error from Excel:
ftps.cwd(ftpExcelDir)
wbFilename = str(orderID + '.xlsx')
inMemoryWB = io.BytesIO()
wb.save(inMemoryWB)
ftps.storbinary('STOR ' + wbFilename, inMemoryWB)
The ftp functions take file objects... but those don't strictly speaking need to be files. Python has BytesIO and StringIO objects which act like files, but are backed by memory. See: https://stackoverflow.com/a/44672691/8833934

How to use AWS lambda to convert pdf files to .txt with python

I need to automate the conversion of many pdf to text files using AWS lambda in python 3.7
I've successfully converted pdf files using poppler/pdftotext, tika, and PyPDF2 on my own machine. However tika times out or needs to run a java instance on a host machine which I'm not sure how to set up. pdftotext needs poppler and all the solutions for running that on lambda seems to be outdated or I'm just not familiar enough with binarys to make sense of that solution. PyPDF2 seems the most promising but testing throws an error.
The code and error I'm getting for PyPDF2 is as follows:
pdf_file = open(s3.Bucket(my_bucket).download_file('test.pdf','test.pdf'),'rb')
"errorMessage": "[Errno 30] Read-only file system: 'test.pdf.3F925aC8'",
"errorType": "OSError",
and if I try to reference it directly,
pdf_file = open('https://s3.amazonaws.com/' + my_bucket + '/test.pdf', 'rb')
"errorMessage": "[Errno 2] No such file or directory: 'https://s3.amazonaws.com/my_bucket/test.pdf'",
"errorType": "FileNotFoundError",
AWS lambda only allows you to write into the /tmp folder, so you should download the file and put it in there
As the error states, you are trying to write to a read-only filesystem. You are using the download_file method which tries to save the file to 'test.pdf' which fails. Try using download_fileobj (link) together with an in-memory buffer (e.g. io.BytesIO) instead. Then, feed that stream to PyPDF2.
Example:
import io
[...]
pdf_stream = io.StringIO()
object.download_fileobj(pdf_stream)
pdf_obj = PdfFileReader(pdf_stream)
[...]

How do I read and write an excel sheet from and to S3 using python?

I have a excel file in S3. My aim is to read that file, process and write it back. I have been using openpyxl to achieve the read and write part of it and it works locally. However the same doesn't work when the file is located is S3.
The current architecture is as follows. A call is made to my flask app where the URL to the file in S3 is passed as a parameter. The parameter is read as follows.
url = request.args.get('url')
In case of a csv file; the following had worked
pandas.read_csv(url)
But in dealing with xlsx files, the following (with openpyxl) :
file = load_workbook(filename = url)
corpus = file['Sheet']
is giving me the following error :
FileNotFoundError: [Errno 2] No such file or directory: 's3.amazonaws.com/data-file/prod/projects/Methane__-oil_and_gas-_-_Sheet1.xlsx'
How do I resolve this and read this file from S3. Also, after I am done processing, how do I write it back to S3.
You can pass a url in pandas.read_csv as it automatically recognizes url's but it looks like the url doesnt have the protocol from your error.
The url should be https://s3.amazonaws.com/data-file/prod/projects/Methane__-oil_and_gas-_-_Sheet1.xlsx
Try appending https:// before the url and see what happens.

Categories

Resources