I wrote some code to tackle a work-related problem. The idea is that the program is in an infinite loop and every 10 minutes it checks a certain folder for new files and if there are any, it copies them to another folder. My code reads all the files in the folder and drops the list into a txt file. After 10 minutes it checks the folder again and compares the old list with a new list. If they are identical, it doesn't do anything. This is where I stopped, because my idea is iffy and stupid and started to think of better ways. Here's the code
import time
import os, os.path
i=1
while i==1:
folder="folderlocation"
def getfiles(dirpat):
filelist = [s for s in os.listdir(dirpat)
if os.path.isfile(os.path.join(dirpat, s))]
filelist.sort(key=lambda s: os.path.getmtime(os.path.join(dirpat, s)))
return filelist
newlist=getfiles(folder)
outfile='c://old.txt'
last=newlist
text_file = open(outfile, "r")
oldlist = text_file.read()
text_file.close()
if str(last) == str(oldlist):
i==i
print "s"
else:
with open(outfile, 'w') as fileHandle:
#this part is virtually useless
diff=list(set(lines) - set(last))
print diff
fileHandle.write(str(getfiles(folder)))
time.sleep(600)
This realization has some bugs in it and doesn't work as I would like to.
Is there another way of tackling it. Is it possible for it to just remember the latest modified date and after 10 minutes there are files that are newer, then it points them out? Don't need the copying part just yet but really need the checking part. Does anyone have some ideas or some similar code?
The cleanest solution is file alteration monitoring. See this question for information for both *ix and Windows. To summarize, on Linux, you could use libfam, and Windows has FindFirstChangeNotification and related functions.
There are some clear problems with your script, such as NOP lines like i==i. You also don't need to convert to a string before comparing.
A technique I've been using (I was actually working on generalising it two hours ago) keeps a {path: os.stat(path).st_mtime} dict, polling in a while True loop with time.sleep and updating the entries and checking if they've changed. Simple and cross-platform.
Another option for solving this problem would be to run rsync periodically, for example from cron.
Related
I currently have a script using os.walk in a for loop like this:
empty = []
if os.path.exists(pickle_location):
df = pd.read_pickle(pickle_location)
#INSERT UPDATE FUNCTION HERE
else:
for i in file_lists:
for root, dir, files in os.walk(i, topdown=False):
for name in files:
empty.append(root+"/"+name)
I was wondering how to make the updating function, so the script does not run forever every time or at the very least cuts the time significantly. Am doing it on a file/location size of 1.2 million, so it takes very +1 hour.
I have looked in the documentation for something, but I don't have much experience. Maybe there is a smarter way? Thanks in advance.
I'm trying to create a program that will read in multiple .txt files and rename them in one go. I'm able to read them in, but I'm falling flat when it comes to defining them all.
First I tried including an 'as' statement after the open call in my loop, but the files kept overwriting each other since it's only one name I'm defining. I was thinking I could read them in as 'file1', 'file2', 'file3'... etc
Any idea on how I can get this naming step to work in a for loop?
import os
os.chdir("\\My Directory")
#User Inputs:
num_files = 3
#Here, users' actual file names in their directory would be 'A.txt',
'B.txt', 'C.txt'
filenames = [A, B, C]
j = 1
for i in filenames:
while j in range(1,num_files):
open(i + ".txt", 'r').read().split() as file[j]
j =+ 1
I was hoping that each time it read in the file, it would define each one as file#. Clearly, my syntax is wrong because of the way I'm indexing 'file'. I've tried using another for loop in the for loop, but that gave me a syntax error as well. I'm really new to python and programming logic in general. Any help would be much appreciated.
Thank you!
You should probably use the rename() function in the os module. An example could be:
import os
os.rename("stackoverflow.html", "xyz.html")
stack overflow.html would be the name you want to call the file and xyz.html would be the current name of the file/the destination of the file. Hope this helps!
i was wondering if this is a common Problem:
I myself wrote a Program which copys files (if found) from one place to another. The first call is os.path.isfile(wholepath+file)
But this call never return something. Instead the Program stops. The Problem could be that there are 1 millions of files (multiple TB). In this case, is there another better solution?
The Program is now running for an hour and does not need much cpu (htop)
isfile() returns True if path is an existing regular file.
try this one:
print [fle for fle in os.listdir(wholepath) if os.path.isfile(os.path.join(wholepath, fle))]
Note that your list will return as an empty list if your path consists of only folders and not files.
I'm currently trying to write a python script to rename a bunch of files. The file is named like this: [Name][Number]-[Number]. To give a specific example: milk-00-00. The next file is milk-00-01, then 02, 03 until X. After that milk-01-00 starts with the same pattern.
What I need to do is to switch 'milk' into a number and replace the '-XX-XX' by '-01', '02', ...
I hope you guys get the idea. The current state of my code is pretty poor, it was hard enough to get it this far though. It looks like this and with this I'm at least able to replace something. I'll also manage to get rid of the 'milk' with the help of google. However, if there is an easier way, I'd really appreciate a push in the right direction!
import os
import sys
path = 'C:/Users/milk/Desktop/asd'
i=00
for filename in os.listdir(path):
if filename.endswith('.tiff'):
newname = filename.replace('00', 'i')
os.rename(filename,newname)
i=i+1
You can use the format function
temp = (' ').join(filename.split('.')[:-1])
os.rename(filename, '10{}-{}.tiff'.format(temp.split('-')[-2],temp.split('-')[-1]))
Since filename has the .tiff extension this program first creates a version of filename without the extension - temp - and then creates new names from that.
os.rename(filename, '1000-%02d.tiff' % i)
i += 1
I've got a segment of my script which will create a list of files to scan through for key words..
The problem is, the log files collectively are around 11gb. When I use grep in the shell to search through them, it takes around 4 or 5 minutes. When I do it with my python script, it just hangs the server to the extent where I need to reboot it.
Doesn't seem right that it would cause the whole server to crash, but in reality I don't need it to scroll through all the files, just those which were modified within the last week.
I've got this so far:
logs = [log for log in glob('/var/opt/cray/log/p0-current/*') if not os.path.isdir(log)]
I assume I will need to add something prior to this to initially filter out the wrong files?
I've been playing with os.path.getmtime in this format:
logs = [log for log in glob('/var/opt/cray/log/p0-current/*') if not os.path.isdir(log)]
for log in logs:
mtime = os.path.getmtime(log)
if mtime < "604800":
do-stuff (create a new list? Or update logs?)
That's kind of where I am now, and it doesn't work but I was hoping there was something more elegant I could do with the list inline?
Depending on how many filenames and how little memory (512MB VPS?), it's possible you're running out of memory creating two lists of all the filenames (one from glob and one from your list comprehension.) Not necessarily the case but it's all I have to go on.
Try switching to iglob (which uses os.scandir under the hood and returns an iterator) and using a generator expression and see if that helps.
Also, getmtime gets a time, not an interval from now.
import os
import glob
import time
week_ago = time.time() - 7 * 24 * 60 * 60
log_files = (
x for x in glob.iglob('/var/opt/cray/log/p0-current/*')
if not os.path.isdir(x)
and os.path.getmtime(x) > week_ago
)
for filename in log_files:
pass # do something