Python os.path.isfile() never retrurns - python

i was wondering if this is a common Problem:
I myself wrote a Program which copys files (if found) from one place to another. The first call is os.path.isfile(wholepath+file)
But this call never return something. Instead the Program stops. The Problem could be that there are 1 millions of files (multiple TB). In this case, is there another better solution?
The Program is now running for an hour and does not need much cpu (htop)

isfile() returns True if path is an existing regular file.
try this one:
print [fle for fle in os.listdir(wholepath) if os.path.isfile(os.path.join(wholepath, fle))]
Note that your list will return as an empty list if your path consists of only folders and not files.

Related

Python shutil.move() inside a loop passes first instance but fails another

I am using shutil.move() to move files to another folder...potentially even reading in and making simple changes and then moving to another folder. The question is what safety logic can help omit errors...
EXAMPLE:
for file in file_list:
if file == 'filename':
shutil.move(source, destination)
elif file = (something to make code fail): #code errors out
shutil.move()
This code will actually move the first file but then breaks and move nothing else. It is pertinent to keep files together as a batch...maybe a nested if? or just a general tip of security...and/or references you have found suitable to this type of question. thanks! -newb circa forever

Rename directory with constantly changing name

I created a script that is supposed to download some data, then run a few processes. The data source (being ArcGIS Online) always downloads the data as a zip file and when extracted the folder name will be a series of letters and numbers. I noticed that these occasionally change (not entirely sure why). My thought is to run an os.listdir to get the folder name then rename it. Where I run into issues is that the list returns the folder name with brackets and quotes. It returns as ['f29a52b8908242f5b1f32c58b74c063b.gdb'] as the folder name while folder in the file explorer does not have the brackets and quotes. Below is my code and the error I receive.
from zipfile import ZipFile
file_name = "THDNuclearFacilitiesBaseSandboxData.zip"
with ZipFile(file_name) as zip:
# unzipping all the files
print("Unzipping "+ file_name)
zip.extractall("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
print('Unzip Complete')
#removes old zip file
os.remove(file_name)
x = os.listdir("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
os.renames(str(x), "Test.gdb")
Output:
FileNotFoundError: [WinError 2] The system cannot find the file specified: "['f29a52b8908242f5b1f32c58b74c063b.gdb']" -> 'Test.gdb'
I'm relatively new to python scripting, so if there is an easier alternative, that would be great as well. Thanks!
os.listdir() returns a list files/objects that are in a folder.
lists are represented, when printed to the screen, using a set of brackets.
The name of each file is a string of characters and strings are represented, when printed to the screen, using quotes.
So we are seeing a list with a single filename:
['f29a52b8908242f5b1f32c58b74c063b.gdb']
To access an item within a list using Python, you can using index notation (which happens to also use brackets to tell Python which item in the list to use by referencing the index or number of the item.
Python list indexes starting at zero, so to get the first (and in this case only item in the list), you can use x[0].
x = os.listdir("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
os.renames(x[0], "Test.gdb")
Having said that, I would generally not use x as a variable name in this case... I might write the code a bit differently:
files = os.listdir("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
os.renames(files[0], "Test.gdb")
Square brackets indicate a list. Try x[0] that should get rid of the brackets and be just the data.
The return from listdir may be a list with only one value or a whole bunch

Python filecmp returns false on binary equal files

I have a zip file and a directory. The files contained within the zip file shall be copied to the directory if they do not exist within the directory or if they differ (not binary equal). So there are the following two cases.
The file from the zip is not contained in the directory
The directory already contains a file with the same name like the one from the directory
In the first case I am simply extracting the file directly to the directory (without preserving the directory structure of the zip on purpose).
In the second case I extract the file from the zip to a temp directory and compare them with the following code
extracted_member = os.path.join(TMP_DIR, os.path.basename(zip_member))
with zipfile.open(zip_member) as member_file, open(extracted_member, 'wb') as target_file:
shutil.copyfileobj(member_file, target_file)
print(filecmp.cmp(extracted_member, file_from_dir, False))
So, if I run the program twice without doing anything between the two executions I run into case 2 (as expected). The file comparison should return true at this point (at least to my understanding) but for some reason the result of print(...) always gives me False.
Does anybody know what I am doing wrong here or do I have a false understanding of the situation?
The problem is that the output file is probably not closed (so may be incompletely flushed/written) at this point since you're performing the filecmp operation within the context block.
Do it outside so the file is properly closed:
with zipfile.open(zip_member) as member_file, open(extracted_member, 'wb') as target_file:
shutil.copyfileobj(member_file, target_file)
print(filecmp.cmp(extracted_member, file_from_dir, False))

Best Practices when matching large number of files against large number of regex strings

I have a directory with several thousand files. I want to sort them into directories based on file name, but many of the file names are very similar.
my thinking is that i'm going to have to write up a bunch of regex strings, and then do some sort of looping. this is my question:
is one of these two options more optimal than the other? do i loop over all my files, and for each file check it against my regexs, keeping track of how many match? or do i do the opposite and loop over the regex and touch each file?
i had though to do it in python, as thats my strongest language, but i'm open to other ideas.
this is some code i use for a program of mine which i have modified for your purposes, it gets a directory (sort_dir) goes every every file there, and creates directories based on the filenames, then moves the files into those directories. since you have not provided any information as to where or how you want to sort your files, you will have to add that part where i have mentioned:
def sort_files(sort_dir):
for f in os.listdir(sort_dir):
if not os.path.isfile(os.path.join(sort_dir, f)):
continue
# this is the folder names to be created, what do you want them to be?
destinationPath = os.path.join(sort_dir,f) #right now its just the filename...
if not os.path.exists(destinationPath):
os.mkdir(destinationPath)
if os.path.exists(os.path.join(destinationPath,f)):
at = True
while at:
try:
shutil.move(os.path.join(sort_dir,f), \
os.path.join(destinationPath,f))
at = False
except:
continue
else:
shutil.move(os.path.join(sort_dir,f), destinationPath)

Python "if modified since" detection?

I wrote some code to tackle a work-related problem. The idea is that the program is in an infinite loop and every 10 minutes it checks a certain folder for new files and if there are any, it copies them to another folder. My code reads all the files in the folder and drops the list into a txt file. After 10 minutes it checks the folder again and compares the old list with a new list. If they are identical, it doesn't do anything. This is where I stopped, because my idea is iffy and stupid and started to think of better ways. Here's the code
import time
import os, os.path
i=1
while i==1:
folder="folderlocation"
def getfiles(dirpat):
filelist = [s for s in os.listdir(dirpat)
if os.path.isfile(os.path.join(dirpat, s))]
filelist.sort(key=lambda s: os.path.getmtime(os.path.join(dirpat, s)))
return filelist
newlist=getfiles(folder)
outfile='c://old.txt'
last=newlist
text_file = open(outfile, "r")
oldlist = text_file.read()
text_file.close()
if str(last) == str(oldlist):
i==i
print "s"
else:
with open(outfile, 'w') as fileHandle:
#this part is virtually useless
diff=list(set(lines) - set(last))
print diff
fileHandle.write(str(getfiles(folder)))
time.sleep(600)
This realization has some bugs in it and doesn't work as I would like to.
Is there another way of tackling it. Is it possible for it to just remember the latest modified date and after 10 minutes there are files that are newer, then it points them out? Don't need the copying part just yet but really need the checking part. Does anyone have some ideas or some similar code?
The cleanest solution is file alteration monitoring. See this question for information for both *ix and Windows. To summarize, on Linux, you could use libfam, and Windows has FindFirstChangeNotification and related functions.
There are some clear problems with your script, such as NOP lines like i==i. You also don't need to convert to a string before comparing.
A technique I've been using (I was actually working on generalising it two hours ago) keeps a {path: os.stat(path).st_mtime} dict, polling in a while True loop with time.sleep and updating the entries and checking if they've changed. Simple and cross-platform.
Another option for solving this problem would be to run rsync periodically, for example from cron.

Categories

Resources