I'm writing this program where I get a number of files, then zip them with encryption using pyzipper, and also I'm using io.BitesIO() to write these files to it so I keep them in-memory. So now, after some other additions, I want to get all of these in-memory files and zip them together in a single encrypted zip file using the same pyzipper.
The code looks something like this:
# Create the in-memory file object
in_memory = BytesIO()
# Create the zip file and open in write mode
with pyzipper.AESZipFile(in_memory, "w", compression=pyzipper.ZIP_LZMA, encryption=pyzipper.WZ_AES) as zip_file:
# Set password
zip_file.setpassword(b"password")
# Save "data" with file_name
zip_file.writestr(file_name, data)
# Go to the beginning
in_memory.seek(0)
# Read the zip file data
data = in_memory.read()
# Add the data to a list
files.append(data)
So, as you may guess the "files" list is an attribute from a class and the whole thing above is a function that does this a number of times and then you get the full files list. For simplicity's sake, I removed most of the irrelevant parts.
I get no errors for now, but when I try to write all files to a new zip file I get an error. Here's the code:
with pyzipper.AESZipFile(test_name, "w", compression=pyzipper.ZIP_LZMA, encryption=pyzipper.WZ_AES) as zfile:
zfile.setpassword(b"pass")
for file in files:
zfile.write(file)
I get a ValueError because of os.stat:
File "C:\Users\vulka\AppData\Local\Programs\Python\Python310\lib\site-packages\pyzipper\zipfile.py", line 820, in from_file
st = os.stat(filename)
ValueError: stat: embedded null character in path
[WHAT I TRIED]
So, I tried using mmap for this purpose but I don't think this can help me and if it can - then I have no idea how to make it work.
I also tried using fs.memoryfs.MemoryFS to temporarily create a virtual filessystem in memory to store all the files and then get them back to zip everything together and then save it to disk. Again - failed. I got tons of different errors in my tests and TBH, there's very little information out there on this fs method and even if what I'm trying to do is possible - I couldn't figure it out.
P.S: I don't know if pyzipper (almost 1:1 zipfile with the addition of encryption) supports nested zip files at all. This could be the problem I'm facing but if it doesn't I'm open to any suggestions for a new approach to doing this. Also, I don't want to rely on a 3rd party software, even if it is open source! (I'm talking about the method of using 7zip to do all the archiving and ecryption, even though it shouldn't even be possible to use it without saving the files to disk in the first place, which is the main thing I'm trying to avoid)
I know how to use 'defaultextension' and 'filetypes', as follows:
self.filetypes = (('CSV files', '*.csv'), ('CSV files', '*.csv'))
self.result_file = fd.asksaveasfile(filetypes = self.filetypes, defaultextension = 'csv')
I can simply add the extension when entering the filename, but I'd prefer not to do that. If I enter 'result' for my filename, I'd like for the actual filename to be result.csv.
While I'm at it, I know my filetypes specification looks a little odd, two identical options. When reading files, I couldn't figure out how to provide only one option without getting an error message. This seems to work, at least when reading. Not sure if that's part of my problem when writing.
you can use Pathlib.path.suffix attribute from the pathlib library.
assuming you have a file called 'test.csv' in the same directory.
from pathlib import Path
file = Path('test.csv')
df = pd.read_csv(file)
#do stuff
df.to_csv(f"new_name.{file.suffix}")
print(file.suffix)
'.csv'
I just figured this out myself. Rather than delete the question, I thought it might be better to provide the answer, in case someone else would benefit from it.
I was using the 'asksaveasfile()' function, which actually opens a file for writing, creating it if necessary.
I have discovered that 'asksaveasfilename()' returns the name of a file the user proposes to open, but does not actually open the file. That allows me to easily add whatever extension I like prior to opening the file and writing to it.
I am reading a yaml file like so in Python3:
def get_file():
file_name = "file.yml"
properties_stream = open(file_name, 'r')
properties_file = yaml.safe_load(properties_stream)
properties_stream.close()
print(properties_file)
return properties_file
When I update file.yml by adding keys or deleting all the contents of the file, the "print(properties_file)" statement is still retaining the contents of the yaml file the first time I ran this code.
Any ideas why this might be happening?
Anthon was correct. Turned out I had two folders with the same name in my project directory in a different subdirectory and it was reading from the wrong file.
Could you give me any explanation, please
I have a part of the code using ConfigParser
File that I am reading in in the directory ~/ui/config.cfg
after I am calling function below and I get a new file in the directory where my module presents which is (~/ui/helper/config.cfg)
class CredentialsCP:
def __init__(self, cloud_name=None):
self.config = ConfigParser.ConfigParser()
self.cloud_name = cloud_name
def rewrite_pass_in_config(self, cloud, new_pass):
if new_pass:
self.config.read('config.cfg')
self.config.set(cloud, 'password', new_pass)
with open('config.cfg', 'wb') as configfile:
self.config.write(configfile)
else:
return False
It creates a new file in the directory where I am running my code from, but I need the same file to be re-written. How can I do that? And why I keep getting the same behavior ?
Since you're using the same file name (config.cfg) when reading and writing (and also, not altering the working dir), you are operating on the same file. Since you're writing the ~/ui/helper/config.cfg file (it gets created after running the code), that's the one that you're reading from too.
So, you are not opening (for reading) the file that you think you are. From [Python]: read(filenames, encoding=None)
If a file named in filenames cannot be opened, that file will be ignored.
...
If none of the named files exist, the ConfigParser instance will contain an empty dataset.
You're reading from a file that doesn't exist, which yields an empty config, and that is the config you're writing in your (desired) file. In order to fix your problem, specify the desired file by its full or relative name. You could have something like:
In __init__ :
self.file_name = os.path.expanduser("~/ui/config.cfg") # Or any path processing code
In rewrite_pass_in_config:
Read:
self.config.read(self.file_name)
Write
with open(self.file_name, "wb") as configfile:
self.config.write(configfile)
Imagine you have a library for working with some sort of XML file or configuration file. The library reads the whole file into memory and provides methods for editing the content. When you are done manipulating the content you can call a write to save the content back to file. The question is how to do this in a safe way.
Overwriting the existing file (starting to write to the original file) is obviously not safe. If the write method fails before it is done you end up with a half written file and you have lost data.
A better option would be to write to a temporary file somewhere, and when the write method has finished, you copy the temporary file to the original file.
Now, if the copy somehow fails, you still have correctly saved data in the temporary file. And if the copy succeeds, you can remove the temporary file.
On POSIX systems I guess you can use the rename system call which is an atomic operation. But how would you do this best on a Windows system? In particular, how do you handle this best using Python?
Also, is there another scheme for safely writing to files?
If you see Python's documentation, it clearly mentions that os.rename() is an atomic operation. So in your case, writing data to a temporary file and then renaming it to the original file would be quite safe.
Another way could work like this:
let original file be abc.xml
create abc.xml.tmp and write new data to it
rename abc.xml to abc.xml.bak
rename abc.xml.tmp to abc.xml
after new abc.xml is properly put in place, remove abc.xml.bak
As you can see that you have the abc.xml.bak with you which you can use to restore if there are any issues related with the tmp file and of copying it back.
If you want to be POSIXly correct and save you have to:
Write to temporary file
Flush and fsync the file (or fdatasync)
Rename over the original file
Note that calling fsync has unpredictable effects on performance -- Linux on ext3 may stall for disk I/O whole numbers of seconds as a result, depending on other outstanding I/O.
Notice that rename is not an atomic operation in POSIX -- at least not in relation to file data as you expect. However, most operating systems and filesystems will work this way. But it seems you missed the very large linux discussion about Ext4 and filesystem guarantees about atomicity. I don't know exactly where to link but here is a start: ext4 and data loss.
Notice however that on many systems, rename will be as safe in practice as you expect. However it is in a way not possible to get both -- performance and reliability across all possible linux confiugrations!
With a write to a temporary file, then a rename of the temporary file, one would expect the operations are dependent and would be executed in order.
The issue however is that most, if not all filesystems separate metadata and data. A rename is only metadata. It may sound horrible to you, but filesystems value metadata over data (take Journaling in HFS+ or Ext3,4 for example)! The reason is that metadata is lighter, and if the metadata is corrupt, the whole filesystem is corrupt -- the filesystem must of course preserve it self, then preserve the user's data, in that order.
Ext4 did break the rename expectation when it first came out, however heuristics were added to resolve it. The issue is not a failed rename, but a successful rename. Ext4 might sucessfully register the rename, but fail to write out the file data if a crash comes shortly thereafter. The result is then a 0-length file and neither orignal nor new data.
So in short, POSIX makes no such guarantee. Read the linked Ext4 article for more information!
In Win API I found quite nice function ReplaceFile that does what name suggests even with optional back-up. There is always way with DeleteFile, MoveFile combo.
In general what you want to do is really good. And I cannot think of any better write scheme.
A simplistic solution. Use tempfile to create a temporary file and if writing succeeds the just rename the file to your original configuration file.
Note that rename is not atomic across filesystems. You'll have to resort to a slight workaround (e.g. tempfile on target filesystem, followed by a rename) in order to be really safe.
For locking a file, see portalocker.
The standard solution is this.
Write a new file with a similar name. X.ext# for example.
When that file has been closed (and perhaps even read and checksummed), then you two two renames.
X.ext (the original) to X.ext~
X.ext# (the new one) to X.ext
(Only for the crazy paranoids) call the OS sync function to force dirty buffer writes.
At no time is anything lost or corruptable. The only glitch can happen during the renames. But you haven't lost anything or corrupted anything. The original is recoverable right up until the final rename.
Per RedGlyph's suggestion, I'm added an implementation of ReplaceFile that uses ctypes to access the Windows APIs. I first added this to jaraco.windows.api.filesystem.
ReplaceFile = windll.kernel32.ReplaceFileW
ReplaceFile.restype = BOOL
ReplaceFile.argtypes = [
LPWSTR,
LPWSTR,
LPWSTR,
DWORD,
LPVOID,
LPVOID,
]
REPLACEFILE_WRITE_THROUGH = 0x1
REPLACEFILE_IGNORE_MERGE_ERRORS = 0x2
REPLACEFILE_IGNORE_ACL_ERRORS = 0x4
I then tested the behavior using this script.
from jaraco.windows.api.filesystem import ReplaceFile
import os
open('orig-file', 'w').write('some content')
open('replacing-file', 'w').write('new content')
ReplaceFile('orig-file', 'replacing-file', 'orig-backup', 0, 0, 0)
assert open('orig-file').read() == 'new content'
assert open('orig-backup').read() == 'some content'
assert not os.path.exists('replacing-file')
While this only works in Windows, it appears to have a lot of nice features that other replace routines would lack. See the API docs for details.
There's now a codified, pure-Python, and I dare say Pythonic solution to this in the boltons utility library: boltons.fileutils.atomic_save.
Just pip install boltons, then:
from boltons.fileutils import atomic_save
with atomic_save('/path/to/file.txt') as f:
f.write('this will only overwrite if it succeeds!\n')
There are a lot of practical options, all well-documented. Full disclosure, I am the author of boltons, but this particular part was built with a lot of community help. Don't hesitate to drop a note if something is unclear!
You could use the fileinput module to handle the backing-up and in-place writing for you:
import fileinput
for line in fileinput.input(filename,inplace=True, backup='.bak'):
# inplace=True causes the original file to be moved to a backup
# standard output is redirected to the original file.
# backup='.bak' specifies the extension for the backup file.
# manipulate line
newline=process(line)
print(newline)
If you need to read in the entire contents before you can write the newline's,
then you can do that first, then print entire new contents with
newcontents=process(contents)
for line in fileinput.input(filename,inplace=True, backup='.bak'):
print(newcontents)
break
If the script ends abruptly, you will still have the backup.