Python shutil.move() inside a loop passes first instance but fails another - python

I am using shutil.move() to move files to another folder...potentially even reading in and making simple changes and then moving to another folder. The question is what safety logic can help omit errors...
EXAMPLE:
for file in file_list:
if file == 'filename':
shutil.move(source, destination)
elif file = (something to make code fail): #code errors out
shutil.move()
This code will actually move the first file but then breaks and move nothing else. It is pertinent to keep files together as a batch...maybe a nested if? or just a general tip of security...and/or references you have found suitable to this type of question. thanks! -newb circa forever

Related

A file when you run creates a replica of itself and deletes the original file. (Python)

I tried creating a code where the file, when you run creates a replica of itself and deletes the original file.
Here is my code:
import shutil
import os
loc=os.getcwd()
shutil.move("./aa/test.py", loc, copy_function=shutil.copy2)
But the issue with this is that:
this code is only 1 time usable and to use it again, I need to change the name of the file or delete the newly created file and then run it again.
Also, If I run it inside a folder, It will always create the new file outside the folder (in a dir up from the exceuting program).
How Do I fix this?
Some Notes:
The copy should be made at the exact place where the original file was.
The folder was empty, just having this file. The file doesn't needs to be in a folder but I just used it as a test instance.
Yes, I understand that if I delete the original file it should stop working. I actually have a figure in my mind of how It should work:
First, a new file with the exact same content in it will be made > in the same path as the original file (with a different name probably).
Then, the original file will be deleted and the 2nd file (which is > the copy of the original file) will be renamed as the exact name and > extension as of the original file which got deleted.
This thing above should repeat every time I run the .py file (containing this code) thus making this code portable and suitable for multiple uses.
Maybe the code to be executed after the file deletion can be stored in memory cache (I guess?).
Easiest way (in pseudo code):
Get name of current script.
Read all contents in memory.
Delete current script.
Write memory contents into new file with the same name.
this code is only 1 time usable and to use it again, I need to change the name of the file or delete the newly created file and then run it again.
That is of course because the file is called differently. You could approach this by having no other files in that folder, or always prefixing the filename in the same way, so that you can find the file although it always is called differently.
Also, If I run it inside a folder, It will always create the new file outside the folder (in a dir up from the exceuting program).
That is because you move it from ./aa to ./. You can take the path of the file and reuse it, apart for the filename, and then it would be in the same folder.
Hey TheKaushikGoswami,
I believe your code does exactly what you told him to and as everybody before me stated, surely only works once. :)
I would like to throw in another idea:
First off I'd personally believe that shutil.move is more of a method for actually moving a file into another directory, as you did by accident.
https://docs.python.org/3/library/shutil.html#shutil.move
So why not firstly parameterize your folder (makes it easier for gui/cmd access) and then just copy to a temporary file and then copying from that temporary file. That way you wont get caught in errors raised if you try to create files already existing and have an easy-to-understand code!
Like so:
import shutil
import os
try:
os.mkdir('./aa/')
except:
print('Folder already exists!')
dest= './aa/'
file= 'test.py'
copypath = dest + 'tmp' + file
srcpath = dest + file
shutil.copy2(srcpath, copypath, follow_symlinks=True)
os.remove(srcpath)
shutil.copy2(copypath, srcpath, follow_symlinks=True)
os.remove(copypath)
But may I ask what your use-case is for that since it really doesn't change anything for me other than creating an exact same file?

Python os.path.isfile() never retrurns

i was wondering if this is a common Problem:
I myself wrote a Program which copys files (if found) from one place to another. The first call is os.path.isfile(wholepath+file)
But this call never return something. Instead the Program stops. The Problem could be that there are 1 millions of files (multiple TB). In this case, is there another better solution?
The Program is now running for an hour and does not need much cpu (htop)
isfile() returns True if path is an existing regular file.
try this one:
print [fle for fle in os.listdir(wholepath) if os.path.isfile(os.path.join(wholepath, fle))]
Note that your list will return as an empty list if your path consists of only folders and not files.

Incorrect file reading when using os.walk in python3

I am crawling through folders using the os.walk() method. In one of the folders, there is a large number of files, around 100,000 of them. The files look like: p_123_456.zip. But they are read as p123456.zip. Indeed, when I open windows explorer to browse the folder, for the first several seconds the files look like p123456.zip, but then change their appearance to p_123_456.zip. This is a strange scenario.
Now, I can't use time.sleep() because all folders and and files are being read into python variables in the looping line. Here is a snippet of the code:
for root, dirs, files in os.walk(srcFolder):
os.chdir(root)
for file in files:
shutil.copy(file, storeFolder)
In the last line, I get a file not found exception, saying that the file p123456.zip does not exist. Has anyone run into this mysterious issue? Anyway to bypass this? What is the cause of this? Thank you.
You don't seem to be concatenating the actual folder name with the filenames. Try changing your code to:
for root, dirs, files in os.walk(srcFolder):
for file in files:
shutil.copy(os.path.join(root, file), storeFolder)
os.chdir should be avoided like the plague. For one thing - if the changes suceeeds, it won't be the directory from which you are running your os.walk anymore - and then, a second chdir on another folder will fail (either stop your porgram or change you to an unexpected folder).
Just add the folder name as prefixes, and don't try using chdir.
Moreover, as for the comment from ShadowRanger above, os.walk officially breaks if you chdir inside its iteration - https://docs.python.org/3/library/os.html#os.walk - that is likely the root of the problem you had.

Pythonic way to smart-rename files for record-keeping sake?

Using IronPython 2.6 (I'm new), I'm trying to write a program that opens a file, saves it at a series of locations, and then opens/manipulates/re-saves those. It will be run by an upper-level program on a loop, and this entire procedure is designed to catch/preserve corrupted saves so my company can figure out why this glitch of corruption occasionally happens.
I've currently worked out the Open/Save to locations parts of the script and now I need to build a function that opens, checks for corruption, and (if corrupted) moves the file into a subfolder (with an iterative renaming applied, for copies) or (if okay), modifies the file and saves a duplicate, where the process is repeated on the duplicate, sans duplication.
I tell this all for context to the root problem. In my situation, what is the most pythonic, consistent, and windows/unix friendly way to move a file (corrupted) into a subfolder while also renaming it based on the number of pre-existing copies of the file that exist within said subfolder?
In other words:
In a folder structure built as:
C:\Folder\test.txt
C:\Folder\Subfolder
C:\Folder\Subfolder\test.txt
C:\Folder\Subfolder\test01.txt
C:\Folder\Subfolder\test02.txt
C:\Folder\Subfolder\test03.txt
How to I move test.txt such that:
C:\Folder\Subfolder
C:\Folder\Subfolder\test.txt
C:\Folder\Subfolder\test01.txt
C:\Folder\Subfolder\test02.txt
C:\Folder\Subfolder\test03.txt
C:\Folder\Subfolder\test04.txt
In an automated way, so that I can loop my program overnight and have it stack up the corrupted text files I need to save? Note: They're not text files in practice, just example.
assuming you are going to use the convention of incrementally suffinxing numbers to the files:
import os.path
import shutil
def store_copy( file_to_copy, destination):
filename, extension = os.path.splitext( os.path.basename(file_to_copy)
existing_files = [i for in in os.listdir(destination) if i.startswith(filename)]
new_file_name = "%s%02d%s" % (filename, len(existing_files), extension)
shutil.copy2(file_to_copy, os.path.join(destination, new_file_name)
There's a fail case if you have subdirectories or files in destination whose names overlap with the source file, ie, if your file is named 'example.txt' and the destination containst 'example_A.txt' as well as 'example.txt' and 'example01.txt' If that's a possibility you'd have to change the test in the existing files = line to something more sophisticated

A safe, atomic file-copy operation

I need to copy a file from one location to another, and I need to throw an exception (or at least somehow recognise) if the file already exists at the destination (no overwriting).
I can check first with os.path.exists() but it's extremely important that the file cannot be created in the small amount of time between checking and copying.
Is there a built-in way of doing this, or is there a way to define an action as atomic?
There is in fact a way to do this, atomically and safely, provided all actors do it the same way. It's an adaptation of the lock-free whack-a-mole algorithm, and not entirely trivial, so feel free to go with "no" as the general answer ;)
What to do
Check whether the file already exists. Stop if it does.
Generate a unique ID
Copy the source file to the target folder with a temporary name, say, <target>.<UUID>.tmp.
Rename† the copy <target>-<UUID>.mole.tmp.
Look for any other files matching the pattern <target>-*.mole.tmp.
If their UUID compares greater than yours, attempt to delete it. (Don't worry if it's gone.)
If their UUID compares less than yours, attempt to delete your own. (Again, don't worry if it's gone.) From now on, treat their UUID as if it were your own.
Check again to see if the destination file already exists. If so, attempt to delete your temporary file. (Don't worry if it's gone. Remember your UUID may have changed in step 5.)
If you didn't already attempt to delete it in step 6, attempt to rename your temporary file to its final name, <target>. (Don't worry if it's gone, just jump back to step 5.)
You're done!
How it works
Imagine each candidate source file is a mole coming out of its hole. Half-way out, it pauses and whacks any competing moles back into the ground, before checking no other mole has fully emerged. If you run this through in your head, you should see that only one mole will ever make it all the way out. To prevent this system from livelocking, we add a total ordering on which mole can whack which. Bam! A PhD thesis lock-free algorithm.
† Step 4 may look unnecessary—why not just use that name in the first place? However, another process may "adopt" your mole file in step 5, and make it the winner in step 7, so it's very important that you're not still writing out the contents! Renames on the same file system are atomic, so step 4 is safe.
There is no way to do this; file copy operations are never atomic and there is no way to make them.
But you can write the file under a random, temporary name and then rename it. Rename operations have to be atomic. If the file already exists, the rename will fail and you'll get an error.
[EDIT2] rename() is only atomic if you do it in the same file system. The safe way is to create the new file in the same folder as the destination.
[EDIT] There is a lot of discussion whether rename is always atomic or not and about the overwrite behavior. So I dug up some resources.
On Linux, if the destination exists and both source and destination are files, then the destination is silently overwritten (man page). So I was wrong there.
But rename(2) still guarantees that either the original file or the new file remain valid if something goes wrong, so the operation is atomic in the sense that it can't corrupt data. It's not atomic in the sense that it prevents two processes from doing the same rename at the same time and you can predict the result. One will win but you can't tell which.
On Windows, if another process is currently writing the file, you get an error if you try to open it for writing, so one advantage for Windows, here.
If your computer crashes while the operation is written to disk, the implementation of the file system will decide how much data gets corrupted. There is nothing an application could do about this. So stop whining already :-)
There is also no other approach that works better or even just as well as this one.
You could use file locking instead. But that would just make everything more complex and yield no additional advantages (besides being more complicated which some people do see as a huge advantage for some reason). And you'd add a lot of nice corner cases when your file is on a network drive.
You could use open(2) with the mode O_CREAT which would make the function fail if the file already exists. But that wouldn't prevent a second process to delete the file and writing their own copy.
Or you could create a lock directory since creating directories has to be atomic as well. But that would not buy you much, either. You'd have to write the locking code yourself and make absolutely, 100% sure that you really, really always delete the lock directory in case of disaster - which you can't.
A while back my team needed a mechanism for atomic writes in Python and we came up the following code (also available in a gist):
def copy_with_metadata(source, target):
"""Copy file with all its permissions and metadata.
Lifted from https://stackoverflow.com/a/43761127/2860309
:param source: source file name
:param target: target file name
"""
# copy content, stat-info (mode too), timestamps...
shutil.copy2(source, target)
# copy owner and group
st = os.stat(source)
os.chown(target, st[stat.ST_UID], st[stat.ST_GID])
def atomic_write(file_contents, target_file_path, mode="w"):
"""Write to a temporary file and rename it to avoid file corruption.
Attribution: #therightstuff, #deichrenner, #hrudham
:param file_contents: contents to be written to file
:param target_file_path: the file to be created or replaced
:param mode: the file mode defaults to "w", only "w" and "a" are supported
"""
# Use the same directory as the destination file so that moving it across
# file systems does not pose a problem.
temp_file = tempfile.NamedTemporaryFile(
delete=False,
dir=os.path.dirname(target_file_path))
try:
# preserve file metadata if it already exists
if os.path.exists(target_file_path):
copy_with_metadata(target_file_path, temp_file.name)
with open(temp_file.name, mode) as f:
f.write(file_contents)
f.flush()
os.fsync(f.fileno())
os.replace(temp_file.name, target_file_path)
finally:
if os.path.exists(temp_file.name):
try:
os.unlink(temp_file.name)
except:
pass
With this code, copying a file atomically is as simple as reading it into a variable and then sending it to atomic_write.
The comments should provide a good idea of what's going on but I also wrote up this more complete explanation on Medium for anyone interested.

Categories

Resources