If you're using the tempfile library, then you might have experienced the scenario where the temporary file is not being automatically deleted even your program has completed. This is quite crucial for programs which would be used multiple times as its directory will get messy once the temporary files pile up.
This is Q&A. Hence, here's my solution.
Apparently, the default settings for the tempfile.TemporaryFile does not automatically delete your temporary file, but adding a prefix in your tempfile.NamedTemporaryFile works:
with tempfile.NamedTemporaryFile(prefix="anything_",
dir=os.getcwd()) as tempf:
'''put something'''
tempf.seek(0)
Notes:
The os.getcwd() is to get the current directory of your file.
Your temporary file would be anything_ + (random values) (i.e anything_23mem)
Hope it helps.
Related
I tried creating a code where the file, when you run creates a replica of itself and deletes the original file.
Here is my code:
import shutil
import os
loc=os.getcwd()
shutil.move("./aa/test.py", loc, copy_function=shutil.copy2)
But the issue with this is that:
this code is only 1 time usable and to use it again, I need to change the name of the file or delete the newly created file and then run it again.
Also, If I run it inside a folder, It will always create the new file outside the folder (in a dir up from the exceuting program).
How Do I fix this?
Some Notes:
The copy should be made at the exact place where the original file was.
The folder was empty, just having this file. The file doesn't needs to be in a folder but I just used it as a test instance.
Yes, I understand that if I delete the original file it should stop working. I actually have a figure in my mind of how It should work:
First, a new file with the exact same content in it will be made > in the same path as the original file (with a different name probably).
Then, the original file will be deleted and the 2nd file (which is > the copy of the original file) will be renamed as the exact name and > extension as of the original file which got deleted.
This thing above should repeat every time I run the .py file (containing this code) thus making this code portable and suitable for multiple uses.
Maybe the code to be executed after the file deletion can be stored in memory cache (I guess?).
Easiest way (in pseudo code):
Get name of current script.
Read all contents in memory.
Delete current script.
Write memory contents into new file with the same name.
this code is only 1 time usable and to use it again, I need to change the name of the file or delete the newly created file and then run it again.
That is of course because the file is called differently. You could approach this by having no other files in that folder, or always prefixing the filename in the same way, so that you can find the file although it always is called differently.
Also, If I run it inside a folder, It will always create the new file outside the folder (in a dir up from the exceuting program).
That is because you move it from ./aa to ./. You can take the path of the file and reuse it, apart for the filename, and then it would be in the same folder.
Hey TheKaushikGoswami,
I believe your code does exactly what you told him to and as everybody before me stated, surely only works once. :)
I would like to throw in another idea:
First off I'd personally believe that shutil.move is more of a method for actually moving a file into another directory, as you did by accident.
https://docs.python.org/3/library/shutil.html#shutil.move
So why not firstly parameterize your folder (makes it easier for gui/cmd access) and then just copy to a temporary file and then copying from that temporary file. That way you wont get caught in errors raised if you try to create files already existing and have an easy-to-understand code!
Like so:
import shutil
import os
try:
os.mkdir('./aa/')
except:
print('Folder already exists!')
dest= './aa/'
file= 'test.py'
copypath = dest + 'tmp' + file
srcpath = dest + file
shutil.copy2(srcpath, copypath, follow_symlinks=True)
os.remove(srcpath)
shutil.copy2(copypath, srcpath, follow_symlinks=True)
os.remove(copypath)
But may I ask what your use-case is for that since it really doesn't change anything for me other than creating an exact same file?
I had a quick google of this but couldn't find anything. I'm using os to get a list of all the file names in the current working directory using the following code:
path = os.getcwd()
files = os.listdir(path)
The list of files returns fine, but the last element has an extra '~$' that isn't in the actual file name. For example:
files
['File1.xlsx', 'File2.xlsx', '~$File3.xlsx']
This is then causing an issue when I iterate through these files to try and import them, as I get the error of:
[Errno 2] No such file or directory: 'C:\\Users\\$File3.xlsx'
If anyone knows why this happens and how I can fix/prevent it, that would be great!
Just thought I'd answer in case anyone else has this issue.
It's nothing to do with os. It happened because I had File3 open in Excel while pulling the list of file names. I've found out that opening a microsoft document creates a temporary 'lock' file, which are denoted by '~$' (this is how it can re-open unsaved data if it crashes etc).
I found the below from here:
The files you are describing are so-called owner files (sometimes
referred to as "lock" files). An owner file is created when you work
with a document ... and it should be deleted when you save your
document and exit.
There's also a SO question about this within Microsoft files, which can be found here
Using IronPython 2.6 (I'm new), I'm trying to write a program that opens a file, saves it at a series of locations, and then opens/manipulates/re-saves those. It will be run by an upper-level program on a loop, and this entire procedure is designed to catch/preserve corrupted saves so my company can figure out why this glitch of corruption occasionally happens.
I've currently worked out the Open/Save to locations parts of the script and now I need to build a function that opens, checks for corruption, and (if corrupted) moves the file into a subfolder (with an iterative renaming applied, for copies) or (if okay), modifies the file and saves a duplicate, where the process is repeated on the duplicate, sans duplication.
I tell this all for context to the root problem. In my situation, what is the most pythonic, consistent, and windows/unix friendly way to move a file (corrupted) into a subfolder while also renaming it based on the number of pre-existing copies of the file that exist within said subfolder?
In other words:
In a folder structure built as:
C:\Folder\test.txt
C:\Folder\Subfolder
C:\Folder\Subfolder\test.txt
C:\Folder\Subfolder\test01.txt
C:\Folder\Subfolder\test02.txt
C:\Folder\Subfolder\test03.txt
How to I move test.txt such that:
C:\Folder\Subfolder
C:\Folder\Subfolder\test.txt
C:\Folder\Subfolder\test01.txt
C:\Folder\Subfolder\test02.txt
C:\Folder\Subfolder\test03.txt
C:\Folder\Subfolder\test04.txt
In an automated way, so that I can loop my program overnight and have it stack up the corrupted text files I need to save? Note: They're not text files in practice, just example.
assuming you are going to use the convention of incrementally suffinxing numbers to the files:
import os.path
import shutil
def store_copy( file_to_copy, destination):
filename, extension = os.path.splitext( os.path.basename(file_to_copy)
existing_files = [i for in in os.listdir(destination) if i.startswith(filename)]
new_file_name = "%s%02d%s" % (filename, len(existing_files), extension)
shutil.copy2(file_to_copy, os.path.join(destination, new_file_name)
There's a fail case if you have subdirectories or files in destination whose names overlap with the source file, ie, if your file is named 'example.txt' and the destination containst 'example_A.txt' as well as 'example.txt' and 'example01.txt' If that's a possibility you'd have to change the test in the existing files = line to something more sophisticated
I am making a python program, and I want to check if it is the users first time running the program (firstTime == True). After its ran however, I want to permanently change firstTime to False. (There are other variables that I want to take input for that will stay if it is the first run, but that should be solved the same way).
Is there a better way then just reading from a file that contains the data? If not, how can I find where the file is being ran from (so the data will be in the same dir)?
If you want to persist data, it will "eventually" be to disk files (though there might be intermediate steps, e.g. via a network or database system, eventually if the data is to be persistent it will be somewhere in disk files).
To "find out where you are",
import os
print os.path.dirname(os.path.abspath(__file__))
There are variants, but this is the basic idea. __file__ in any .py script or module gives the file path in which that file resides (won't work on the interactive command line, of course, since there's no file involved then;-).
The os.path module in Python's standard library has many useful function to manipulate path strings -- here, we're using two: abspath to give an absolute (not relative) version of the file's path, so you don't have to care about what your current working directory is; and dirname to extract just the directory name (actually, the whole directory path;-) and drop the filename proper (you don't care if the module's name is foo.py or bar.py, only in what directory it is;-).
It is enough to just create file in same directory if program is run first time (of course that file can be deleted to do stuff for first run again, but that can be sometimes usefull):
firstrunfile = 'config.dat'
if not os.path.exists(firstrunfile):
## configuration here
open(firstrunfile,'w').close() ## .write(configuration)
print 'First run'
firstTime == True
else:
print 'Not first run'
## read configuration
firstTime == False
Imagine you have a library for working with some sort of XML file or configuration file. The library reads the whole file into memory and provides methods for editing the content. When you are done manipulating the content you can call a write to save the content back to file. The question is how to do this in a safe way.
Overwriting the existing file (starting to write to the original file) is obviously not safe. If the write method fails before it is done you end up with a half written file and you have lost data.
A better option would be to write to a temporary file somewhere, and when the write method has finished, you copy the temporary file to the original file.
Now, if the copy somehow fails, you still have correctly saved data in the temporary file. And if the copy succeeds, you can remove the temporary file.
On POSIX systems I guess you can use the rename system call which is an atomic operation. But how would you do this best on a Windows system? In particular, how do you handle this best using Python?
Also, is there another scheme for safely writing to files?
If you see Python's documentation, it clearly mentions that os.rename() is an atomic operation. So in your case, writing data to a temporary file and then renaming it to the original file would be quite safe.
Another way could work like this:
let original file be abc.xml
create abc.xml.tmp and write new data to it
rename abc.xml to abc.xml.bak
rename abc.xml.tmp to abc.xml
after new abc.xml is properly put in place, remove abc.xml.bak
As you can see that you have the abc.xml.bak with you which you can use to restore if there are any issues related with the tmp file and of copying it back.
If you want to be POSIXly correct and save you have to:
Write to temporary file
Flush and fsync the file (or fdatasync)
Rename over the original file
Note that calling fsync has unpredictable effects on performance -- Linux on ext3 may stall for disk I/O whole numbers of seconds as a result, depending on other outstanding I/O.
Notice that rename is not an atomic operation in POSIX -- at least not in relation to file data as you expect. However, most operating systems and filesystems will work this way. But it seems you missed the very large linux discussion about Ext4 and filesystem guarantees about atomicity. I don't know exactly where to link but here is a start: ext4 and data loss.
Notice however that on many systems, rename will be as safe in practice as you expect. However it is in a way not possible to get both -- performance and reliability across all possible linux confiugrations!
With a write to a temporary file, then a rename of the temporary file, one would expect the operations are dependent and would be executed in order.
The issue however is that most, if not all filesystems separate metadata and data. A rename is only metadata. It may sound horrible to you, but filesystems value metadata over data (take Journaling in HFS+ or Ext3,4 for example)! The reason is that metadata is lighter, and if the metadata is corrupt, the whole filesystem is corrupt -- the filesystem must of course preserve it self, then preserve the user's data, in that order.
Ext4 did break the rename expectation when it first came out, however heuristics were added to resolve it. The issue is not a failed rename, but a successful rename. Ext4 might sucessfully register the rename, but fail to write out the file data if a crash comes shortly thereafter. The result is then a 0-length file and neither orignal nor new data.
So in short, POSIX makes no such guarantee. Read the linked Ext4 article for more information!
In Win API I found quite nice function ReplaceFile that does what name suggests even with optional back-up. There is always way with DeleteFile, MoveFile combo.
In general what you want to do is really good. And I cannot think of any better write scheme.
A simplistic solution. Use tempfile to create a temporary file and if writing succeeds the just rename the file to your original configuration file.
Note that rename is not atomic across filesystems. You'll have to resort to a slight workaround (e.g. tempfile on target filesystem, followed by a rename) in order to be really safe.
For locking a file, see portalocker.
The standard solution is this.
Write a new file with a similar name. X.ext# for example.
When that file has been closed (and perhaps even read and checksummed), then you two two renames.
X.ext (the original) to X.ext~
X.ext# (the new one) to X.ext
(Only for the crazy paranoids) call the OS sync function to force dirty buffer writes.
At no time is anything lost or corruptable. The only glitch can happen during the renames. But you haven't lost anything or corrupted anything. The original is recoverable right up until the final rename.
Per RedGlyph's suggestion, I'm added an implementation of ReplaceFile that uses ctypes to access the Windows APIs. I first added this to jaraco.windows.api.filesystem.
ReplaceFile = windll.kernel32.ReplaceFileW
ReplaceFile.restype = BOOL
ReplaceFile.argtypes = [
LPWSTR,
LPWSTR,
LPWSTR,
DWORD,
LPVOID,
LPVOID,
]
REPLACEFILE_WRITE_THROUGH = 0x1
REPLACEFILE_IGNORE_MERGE_ERRORS = 0x2
REPLACEFILE_IGNORE_ACL_ERRORS = 0x4
I then tested the behavior using this script.
from jaraco.windows.api.filesystem import ReplaceFile
import os
open('orig-file', 'w').write('some content')
open('replacing-file', 'w').write('new content')
ReplaceFile('orig-file', 'replacing-file', 'orig-backup', 0, 0, 0)
assert open('orig-file').read() == 'new content'
assert open('orig-backup').read() == 'some content'
assert not os.path.exists('replacing-file')
While this only works in Windows, it appears to have a lot of nice features that other replace routines would lack. See the API docs for details.
There's now a codified, pure-Python, and I dare say Pythonic solution to this in the boltons utility library: boltons.fileutils.atomic_save.
Just pip install boltons, then:
from boltons.fileutils import atomic_save
with atomic_save('/path/to/file.txt') as f:
f.write('this will only overwrite if it succeeds!\n')
There are a lot of practical options, all well-documented. Full disclosure, I am the author of boltons, but this particular part was built with a lot of community help. Don't hesitate to drop a note if something is unclear!
You could use the fileinput module to handle the backing-up and in-place writing for you:
import fileinput
for line in fileinput.input(filename,inplace=True, backup='.bak'):
# inplace=True causes the original file to be moved to a backup
# standard output is redirected to the original file.
# backup='.bak' specifies the extension for the backup file.
# manipulate line
newline=process(line)
print(newline)
If you need to read in the entire contents before you can write the newline's,
then you can do that first, then print entire new contents with
newcontents=process(contents)
for line in fileinput.input(filename,inplace=True, backup='.bak'):
print(newcontents)
break
If the script ends abruptly, you will still have the backup.