Openshift how to validate file before saving into app-root/data - python

I'm trying to find a way to use the function "identify" from imagemagick without saving the image file in app-root/data. Basically I want to validate the image file before actually saving the file into the destination. If saved into app-root/data, i can easily just do this:
(temp, tempError) = (subprocess.Popen(['identify', '../data/' + filename + extension], stdout=subprocess.PIPE)).communicate()
But this would require the image to be uploaded first before identifying. Any ways to do this?

You should save the file into the /tmp directory first, then validate it, and if it passes, then move it to your ~/app-root/data directory.
You can check out this Filesystem Overview on the Developer Portal for more information: https://developers.openshift.com/en/managing-filesystem.html

Related

Archive files directly from memory in Python

I'm writing this program where I get a number of files, then zip them with encryption using pyzipper, and also I'm using io.BitesIO() to write these files to it so I keep them in-memory. So now, after some other additions, I want to get all of these in-memory files and zip them together in a single encrypted zip file using the same pyzipper.
The code looks something like this:
# Create the in-memory file object
in_memory = BytesIO()
# Create the zip file and open in write mode
with pyzipper.AESZipFile(in_memory, "w", compression=pyzipper.ZIP_LZMA, encryption=pyzipper.WZ_AES) as zip_file:
# Set password
zip_file.setpassword(b"password")
# Save "data" with file_name
zip_file.writestr(file_name, data)
# Go to the beginning
in_memory.seek(0)
# Read the zip file data
data = in_memory.read()
# Add the data to a list
files.append(data)
So, as you may guess the "files" list is an attribute from a class and the whole thing above is a function that does this a number of times and then you get the full files list. For simplicity's sake, I removed most of the irrelevant parts.
I get no errors for now, but when I try to write all files to a new zip file I get an error. Here's the code:
with pyzipper.AESZipFile(test_name, "w", compression=pyzipper.ZIP_LZMA, encryption=pyzipper.WZ_AES) as zfile:
zfile.setpassword(b"pass")
for file in files:
zfile.write(file)
I get a ValueError because of os.stat:
File "C:\Users\vulka\AppData\Local\Programs\Python\Python310\lib\site-packages\pyzipper\zipfile.py", line 820, in from_file
st = os.stat(filename)
ValueError: stat: embedded null character in path
[WHAT I TRIED]
So, I tried using mmap for this purpose but I don't think this can help me and if it can - then I have no idea how to make it work.
I also tried using fs.memoryfs.MemoryFS to temporarily create a virtual filessystem in memory to store all the files and then get them back to zip everything together and then save it to disk. Again - failed. I got tons of different errors in my tests and TBH, there's very little information out there on this fs method and even if what I'm trying to do is possible - I couldn't figure it out.
P.S: I don't know if pyzipper (almost 1:1 zipfile with the addition of encryption) supports nested zip files at all. This could be the problem I'm facing but if it doesn't I'm open to any suggestions for a new approach to doing this. Also, I don't want to rely on a 3rd party software, even if it is open source! (I'm talking about the method of using 7zip to do all the archiving and ecryption, even though it shouldn't even be possible to use it without saving the files to disk in the first place, which is the main thing I'm trying to avoid)

Delete google drive files by extension with PyDrive

I'm trying to delete all files with the extension '.pdf' from a google drive folder.
Everything is fine with the API authentication, I can upload the files. The problem is being the delete.
Here I upload
upload_file = 'Test1.pdf'
gfile = drive.CreateFile({'parents': [{'id': '11SsSKYEATgn_VWzSb-8RjRL-VoIxvamC'}]})
gfile.SetContentFile(upload_file)
gfile.Upload()
Here I try to delete
delfile = drive.CreateFile({'parents': [{'id': '11SsSKYEATgn_VWzSb-8RjRL-VoIxvamC'}]})
filedel = "*.pdf"
delfile.SetContentFile(filedel)
delfile.Delete()
Error:
Traceback (most recent call last):
File "C:/Users/linol/Documents/ProjetoRPA-Python/RPA-TESTE.py", line 40, in <module>
delfile.SetContentFile(filedel)
File "C:\Users\linol\Documents\ProjetoRPA-Python\venv\lib\site-packages\pydrive\files.py", line 175, in SetContentFile
self.content = open(filename, 'rb')
OSError: [Errno 22] Invalid argument: '*.pdf'
I believe your goal and your current situation as follows.
You want to delete the files of PDF file in the specific folder.
You want to achieve this using pydrive for python.
You have already been able to get and put values for Google Drive using Drive API.
In this case, I would like to propose the following flow.
Retrieve file list of PDF file from the specific folder.
Delete the files using the file list.
When above flow is reflected to the script, it becomes as follows.
Sample script:
Please modify ### to your folder ID.
# 1. Retrieve file list of PDF file from the specific folder.
fileList = drive.ListFile({'q': "'###' in parents and mimeType='application/pdf'"}).GetList()
# 2. Delete the files using the file list.
for e in fileList:
drive.CreateFile({'id': e['id']}).Trash()
# drive.CreateFile({'id': e['id']}).Delete() # When you use this, the files are completely deleted. Please be careful this.
This sample script retrieves the files using the mimeType. When you want to retrieve the files using the filename, you can also use fileList = drive.ListFile({'q': "'###' in parents and title contains '.pdf'"}).GetList().
IMPORTANT: In this sample script, when Delete() is used, the files are completely deleted from Google Drive. So at first, I would like to recommend to use Trash() instead of Delete() as a test of script. By this, the files are not deleted and moved to the trash box. By this, I thought that you can test the script.
Note:
It seems that PyDrive uses Drive API v2. Please be careful this.
Reference:
PyDrive

How do I upload an .xlsx file to an FTP without creating a local file?

I am writing a script to pull xml files from an FTP, turn them into an .xlsx file, and re-upload to a different directory on the same FTP. I want to create the .xlsx file within my script instead of copying the xml data into a template and uploading my local file.
I tried creating a filename for the .xlsx doc, but i realize that i need to save it before i can upload to the FTP. My question is, would it be better to create a temporary folder on the server the script is being run and empty the folder out afterwards? or is there a way to upload the doc without saving it anywhere (preferred)? I will be running the script on a windows server
ftps.cwd(ftpExcelDir)
wbFilename = str(orderID + '.xlsx')
savedFile = saving the file somwhere # this is the part im having trouble with
ftps.storline('STOR ' + wbFilename, savedFile)
With the following code, i can get the .xlsx files to save to the FTP, but i recieve an invalid extension/corrupt file error from Excel:
ftps.cwd(ftpExcelDir)
wbFilename = str(orderID + '.xlsx')
inMemoryWB = io.BytesIO()
wb.save(inMemoryWB)
ftps.storbinary('STOR ' + wbFilename, inMemoryWB)
The ftp functions take file objects... but those don't strictly speaking need to be files. Python has BytesIO and StringIO objects which act like files, but are backed by memory. See: https://stackoverflow.com/a/44672691/8833934

How to obtain a file object from a variable or from http URL without actually creating a file?

I want to manipulate a downloaded PDF using PyPDF and for that, I need a file object.
I use GAE to host my Python app, so I cannot actually write the file to disk.
Is there any way to obtain the file object from URL or from a variable that contains the file contents?
TIA.
Most tools (including urllib) already give you a file-like, but if you need true random access then you'll need to create a StringIO.StringIO and read the data into it.
In GAE you can use the blobstore to read, write file data and to upload and download files. And you can use the File API:
Example :
_file = files.blobstore.create(mime_type=mimetype, _blobinfo_uploaded_filename='test')
with files.open(_file, 'a') as f :
f.write(somedata)
files.finalize(_file)

Extract files from zip folder and store these files in blobstore

i want to upload zip folder from file input in form the i want to extract the contents of this uploaded zip folder,and store the contents (files)of this zip in the blobstore in order to download them after putting these files in one folder,but the problem is that i can't deal with the zip folder directly(to read it), i tried as this:
form = cgi.FieldStorage()
file_upload = form['file']
zip1=file_upload.filename
zipstream=StringIO.StringIO(zip1.read())
But the problem still that i can't read the zip as previous,also i tried to read zip folder directly like this:
z1=zipfile.ZipFile(zip1,"r")
But there was an error in this way.Please can any one help me.Thanks in advance.
Based on your comment, it sounds like you need to take a closer look at the cgi module documentation, which includes the following:
If a field represents an uploaded file, accessing the value via the value attribute or the getvalue() method reads the entire file in memory as a string. This may not be what you want. You can test for an uploaded file by testing either the filename attribute or the file attribute. You can then read the data at leisure from the file attribute...
This suggests that you need to modify your code to look something like:
form = cgi.FieldStorage()
file_upload = form['file']
z1 = zipfile.ZipFile(file_upload.file, 'r')
There are additional examples in the documentation.
You don't have to extract files from the zip in order to make them available for download - see this post for an example of serving direct from a zip. You can adapt that code if you want to extract the files and store them individually in the blobstore.

Categories

Resources