I am new to programming, even more so with Python. So please excuse any ignorance on my part. I am trying to write a script for myself that will move files that have been modified in the last 24 hours. So far I have came up with this:
import datetime
import os
import shutil
src = "C:\Users\Student\Desktop\FolderA"
dst = "C:\Users\Student\Desktop\FolderB"
now = dt.datetime.now()
before = now - dt.timedelta(hours=24)
def mins_since_mod(fname):
return (os.path.getmtime(fname))
for fname in os.listdir(src):
if mins_since_mod > before:
src_fname = os.path.join(src,fname)
os.path.join(dst,fname)
shutil.move(src_fname, dst)
I know i'm close to the solution, but I can't seem to figure out how to get this to work. I looked around here on the community and was not able to find a solution to my problem. Thank you for any leads or suggestions.
There are a few things to change. First, you can't compare the datetime in before to the Unix timestamp that getmtime() returns. It's easier to just use that directly. Also, you actually need to pass the (full) filename to mins_since_mod() for it to do anything.
Here's something that should work, changing the name of mins_since_mod() to reflect what it does better:
import time
import os
import shutil
SECONDS_IN_DAY = 24 * 60 * 60
src = "C:\Users\Student\Desktop\FolderA"
dst = "C:\Users\Student\Desktop\FolderB"
now = time.time()
before = now - SECONDS_IN_DAY
def last_mod_time(fname):
return os.path.getmtime(fname)
for fname in os.listdir(src):
src_fname = os.path.join(src, fname)
if last_mod_time(src_fname) > before:
dst_fname = os.path.join(dst, fname)
shutil.move(src_fname, dst_fname)
Hey mate I have actually just done something like this myself. I found that there will be a few issues will the time comparison as well as some issues in comparing and moving folders.
Try this:
import os
import shutil
import datetime
def filter_by_date(src_folder, archive_date):
os.chdir(src_folder)
delay_time = 24 * 60 * 60
archive_period = archive_date - delay_time
return [
name for name in os.listdir(u'.')
if os.path.isdir(name)
and datetime.datetime.fromtimestamp(os.path.getmtime(name)) < archive_period
]
if __name__ == '__main__':
folders = filter_by_date("C:/Users/Student/Desktop/FolderA", time.time())
for files in folders:
print files
try:
shutil.copytree(files, os.path.join("C:/Users/Student/Desktop/New", files))
except OSError as e:
print('\nDirectory not copied. Error: %s' % e)
except shutil.Error as e:
try:
files = files.encode('UTF-8')
dst_path = os.path.join('C:/Users/Student/Desktop/FolderB/', files)
shutil.copytree(files, dst_path)
finally:
print('\nDirectory not copied. Error: %s' % e)
print "\Completed"
This is going to ensure any file name (including Chinese, Russian and Japanese will be copied) and any folder (directory or sub-directory) is copied. It will also keep all file attributes.
Related
I have a very interesting case. I have a built a filemanagement system in python which moves files from source to destination or archive everytime I run it. Now I want to make 2 tables in MySQL (using python) who are actually monitoring the filemanagement system.
The first table monitors the last time the filemanagementsystem ran. So just a small table with 1 column and 1 row which contains the following information --> Last run: 1-1-2020 10:30.
The second table has to give me all the content of the last file or files which were/was moved from source to destination in table form.
Everytime I run my python script 2 things need to happen. 1. The files are being moved and 2. the MySQL monitoring tables are being updated. Does anyone knows how this needs to be done? Please note I'am using a MySQL Workbench 8.0. Thank you indeed.
Here is the code I have right now for moving the files.
import os
import time
from datetime import datetime
import pathlib
SOURCE = r'C:\Users\AM\Desktop\Source'
DESTINATION = r'C:\Users\AM\Desktop\Destination'
ARCHIVE =r'C:\Users\AM\Desktop\Archive'
def get_time_difference(date, time_string):
"""
You may want to modify this logic to change the way the time difference is calculated.
"""
time_difference = datetime.now() - datetime.strptime(f"{date} {time_string}", "%d-%m-%Y %H:%M")
hours = time_difference.total_seconds() // 3600
minutes = (time_difference.total_seconds() % 3600) // 60
return f"{int(hours)}:{int(minutes)}"
def move_and_transform_file(file_path, dst_path, delimiter="\t"):
"""
Reads the data from the old file, writes it into the new file and then
deletes the old file.
"""
with open(file_path, "r") as input_file, open(dst_path, "w") as output_file:
data = {
"Date": None,
"Time": None,
"Power": None,
}
time_difference_seen = False
for line in input_file:
(line_id, item, line_type, value) = line.strip().split()
if item in data:
data[item] = value
if not time_difference_seen and data["Date"] is not None and data["Time"] is not None:
time_difference = get_time_difference(data["Date"], data["Time"])
time_difference_seen = True
print(delimiter.join([line_id, "TimeDif", line_type, time_difference]), file=output_file)
if item == "Power":
value = str(int(value) * 10)
print(delimiter.join((line_id, item, line_type, value)), file=output_file)
os.remove(file_path)
def process_files(all_file_paths, newest_file_path, subdir):
"""
For each file, decide where to send it, then perform the transformation.
"""
for file_path in all_file_paths:
if file_path == newest_file_path and os.path.getctime(newest_file_path) < time.time() - 120:
dst_root = DESTINATION
else:
dst_root = ARCHIVE
dst_path = os.path.join(dst_root, subdir, os.path.basename(file_path))
move_and_transform_file(file_path, dst_path)
def main():
"""
Gather the files from the directories and then process them.
"""
for subdir in os.listdir(SOURCE):
subdir_path = os.path.join(SOURCE, subdir)
if not os.path.isdir(subdir_path):
continue
all_file_paths = [
os.path.join(subdir_path, p)
for p in os.listdir(subdir_path)
if os.path.isfile(os.path.join(subdir_path, p))
]
if all_file_paths:
newest_path = max(all_file_paths, key=os.path.getctime)
process_files(all_file_paths, newest_path, subdir)
if __name__ == "__main__":
main()
I have a program that gets the modified date/time of directories and files. I then want to get the date/time from 30 seconds ago and compare that to the modified date/time.
If the modified time is less than 30 seconds ago, I want to trigger an alert. My code is triggering alert even if the modified time occurred more than 30 seconds ago.
Is there a way I can only trigger an alert if the modification occurred less than 30 seconds ago?
import os.path
import time, stat
import sys
share_dir = 'C:/mydir'
source_dir = r'' + share_dir + '/'
def trigger():
print("Triggered")
def check_dir():
while True:
for currentdir, dirs, files in os.walk(source_dir):
for file in files:
currentfile = os.path.join(currentdir, file)
# get modified time for files
ftime = os.stat(currentfile )[stat.ST_MTIME]
past = time.time() - 30 # last 30 seconds
if time.ctime(ftime) >= time.ctime(past):
print(time.ctime(ftime) + " > " + time.ctime(past))
print("Found modification in last 30 seconds for file =>", currentfile, time.ctime(ftime))
trigger()
sys.exit()
else:
print('No recent modifications.' + currentfile)
for folder in dirs:
currentfolder = os.path.join(currentdir, folder)
# get modified time for directories
dtime = os.stat(currentfolder)[stat.ST_MTIME]
past = time.time() - 30 # last 30 seconds
if time.ctime(dtime) >= time.ctime(past):
print(time.ctime(dtime) + " > " + time.ctime(past))
print("Found modification in last 30 seconds for folder =>", currentfolder, time.ctime(dtime))
trigger()
sys.exit()
else:
print('No recent modifications: ' + currentfolder)
time.sleep(4)
if __name__ == "__main__":
check_dir()
I'm doing this on a large scale file system. I personally use SQLite3 and round the mtime of the file (I had weird things happen using any other sort of operation and it was more consistent).
I'm also unsure why you're not just doing a pure math solution. Take the current time, take the mtime of the file, find the difference between them and if it's less than or equal to thirty, you get a hit.
I redid some of the code. I recommend trying this:
import os.path
import time, stat
import sys
def trigger():
print("Triggered")
def check_dir(source_dir):
for currentdir, dirs, files in os.walk(source_dir):
for file in files:
currentfile = os.path.join(currentdir, file)
# get modified time for files
ftime = os.path.getmtime(currentfile)
if time.time() - ftime <= 30:
print("Found modification in last 30 seconds for file =>", currentfile, time.ctime(ftime))
trigger()
exit(0)
else:
print('No recent modifications.' + currentfile)
for folder in dirs:
currentfolder = os.path.join(currentdir, folder)
# get modified time for directories
dtime = os.stat(currentfolder)[stat.ST_MTIME]
if time.time() - dtime <= 30:
print("Found modification in last 30 seconds for folder =>", currentfolder, time.ctime(dtime))
trigger()
exit(0)
else:
print('No recent modifications: ' + currentfolder)
if __name__ == "__main__":
check_dir('yourdirectoryhere')
Did some light testing on my own system and it seemed to work perfectly. Might want to add back the while loop but it should work.
I am trying to find a range of specific files in a directory using python 2.7.
I have many files in a directory that are named like AB_yyyyjjjhhmmss_001.txt, where y is year, j is julian date, h is hour and so on. Each time corresponds to the time some data was taken and not necessarily the time the file was created or manipulated. I like to pick out a range of time, say from 2013305010000 to 2013306123000 and process them.
I have something like,
import glob
def get_time (start_time = None, end_time = None):
if start_time == None:
start_time = input("start: ")
if end_time == None:
end_time = input("end: ")
duration = str(start_time) + "-" + str(end_time)
listing = glob.glob("*_[" + duration + "]_*")
I learned that [ ] are only meant to match single digit. So I am totally off track here. I also tried {start_time..end_time} combo with no avail.
If all files have the same structure, you can simply write:
import os
import re
start = sys.argv[1]
end = sys.argv[2]
for filename in os.listdir('test'):
if start <= filename.split('_')[1] <= end:
print "Process %s" % filename
Example:
$ ls test
AB_2013105010000_001.txt AB_2013305010000_001.txt AB_2013306103000_001.txt
AB_2013306123000_001.txt AB_2013316103000_001.txt
$ python t.py 2013305010000 2013306123000
Process AB_2013305010000_001.txt
Process AB_2013306103000_001.txt
Process AB_2013306123000_001.txt
I might try
import re
import os
import datetime
def filename_to_datetime(filename):
filedate = re.match(r'.*(\d{13}).*', filename)
if filedate:
return datetime.datetime.strptime(re.match(filedate.group(1), '%Y%j%H%M%S')
else:
raise ValueError("File has wrong format!")
def get_time(start_time, end_time):
return [filename for filename in os.listdir('.') if
start_time < filename_to_datetime(filename) < end_time]
I have written a script in python using pywin32 to save pdf files to text that up until recently was working fine. I use similar methods in Excel. The code is below:
def __pdf2Txt(self, pdf, fileformat="com.adobe.acrobat.accesstext"):
outputLoc = os.path.dirname(pdf)
outputLoc = os.path.join(outputLoc, os.path.splitext(os.path.basename(pdf))[0] + '.txt')
try:
win32com.client.gencache.EnsureModule('{E64169B3-3592-47d2-816E-602C5C13F328}', 0, 1, 1)
adobe = win32com.client.DispatchEx('AcroExch.App')
pdDoc = win32com.client.DispatchEx('AcroExch.PDDoc')
pdDoc.Open(pdf)
jObject = pdDoc.GetJSObject()
jObject.SaveAs(outputLoc, "com.adobe.acrobat.accesstext")
except:
traceback.print_exc()
return False
finally:
del jObject
pdDoc.Close()
del pdDoc
adobe.Exit()
del adobe
However this code has suddenly stopped working and I get the following output:
Traceback (most recent call last):
File "C:\Documents and Settings\ablishen\workspace\HooverKeyCreator\src\HooverKeyCreator.py", line 38, in __pdf2Txt
jObject.SaveAs(outputLoc, "com.adobe.acrobat.accesstext")
File "C:\Python27\lib\site-packages\win32com\client\dynamic.py", line 505, in __getattr__
ret = self._oleobj_.Invoke(retEntry.dispid,0,invoke_type,1)
com_error: (-2147467263, 'Not implemented', None, None)
False
I have similar code written in VB that works correctly so I'm guessing that it has something to do with the COM interfaces not binding to the appropriate functions correctly? (my COM knowledge is patchy).
Blish, this thread holds the key to the solution you are looking for: https://mail.python.org/pipermail/python-win32/2002-March/000260.html
I admit that the post above is not the easiest to find (probably because Google scores it low based on the age of the content?).
Specifically, applying this piece of advice will get things running for you: https://mail.python.org/pipermail/python-win32/2002-March/000265.html
For reference, the complete piece of code that does not require you to manually patch dynamic.py (snippet should run pretty much out of the box):
# gets all files under ROOT_INPUT_PATH with FILE_EXTENSION and tries to extract text from them into ROOT_OUTPUT_PATH with same filename as the input file but with INPUT_FILE_EXTENSION replaced by OUTPUT_FILE_EXTENSION
from win32com.client import Dispatch
from win32com.client.dynamic import ERRORS_BAD_CONTEXT
import winerror
# try importing scandir and if found, use it as it's a few magnitudes of an order faster than stock os.walk
try:
from scandir import walk
except ImportError:
from os import walk
import fnmatch
import sys
import os
ROOT_INPUT_PATH = None
ROOT_OUTPUT_PATH = None
INPUT_FILE_EXTENSION = "*.pdf"
OUTPUT_FILE_EXTENSION = ".txt"
def acrobat_extract_text(f_path, f_path_out, f_basename, f_ext):
avDoc = Dispatch("AcroExch.AVDoc") # Connect to Adobe Acrobat
# Open the input file (as a pdf)
ret = avDoc.Open(f_path, f_path)
assert(ret) # FIXME: Documentation says "-1 if the file was opened successfully, 0 otherwise", but this is a bool in practise?
pdDoc = avDoc.GetPDDoc()
dst = os.path.join(f_path_out, ''.join((f_basename, f_ext)))
# Adobe documentation says "For that reason, you must rely on the documentation to know what functionality is available through the JSObject interface. For details, see the JavaScript for Acrobat API Reference"
jsObject = pdDoc.GetJSObject()
# Here you can save as many other types by using, for instance: "com.adobe.acrobat.xml"
jsObject.SaveAs(dst, "com.adobe.acrobat.accesstext")
pdDoc.Close()
avDoc.Close(True) # We want this to close Acrobat, as otherwise Acrobat is going to refuse processing any further files after a certain threshold of open files are reached (for example 50 PDFs)
del pdDoc
if __name__ == "__main__":
assert(5 == len(sys.argv)), sys.argv # <script name>, <script_file_input_path>, <script_file_input_extension>, <script_file_output_path>, <script_file_output_extension>
#$ python get.txt.from.multiple.pdf.py 'C:\input' '*.pdf' 'C:\output' '.txt'
ROOT_INPUT_PATH = sys.argv[1]
INPUT_FILE_EXTENSION = sys.argv[2]
ROOT_OUTPUT_PATH = sys.argv[3]
OUTPUT_FILE_EXTENSION = sys.argv[4]
# tuples are of schema (path_to_file, filename)
matching_files = ((os.path.join(_root, filename), os.path.splitext(filename)[0]) for _root, _dirs, _files in walk(ROOT_INPUT_PATH) for filename in fnmatch.filter(_files, INPUT_FILE_EXTENSION))
# patch ERRORS_BAD_CONTEXT as per https://mail.python.org/pipermail/python-win32/2002-March/000265.html
global ERRORS_BAD_CONTEXT
ERRORS_BAD_CONTEXT.append(winerror.E_NOTIMPL)
for filename_with_path, filename_without_extension in matching_files:
print "Processing '{}'".format(filename_without_extension)
acrobat_extract_text(filename_with_path, ROOT_OUTPUT_PATH, filename_without_extension, OUTPUT_FILE_EXTENSION)
I have tested this on WinPython x64 2.7.6.3, Acrobat X Pro
makepy.py is a script that comes with the win32com python package.
Running it for your installation "wires" python into the COM/OLE object in Windows. The following is an excerpt of some code I used to talk to Excel and do some stuff in it. This example gets the name of sheet 1 in the current workbook. It automatically runs makepy if it has an exception:
import win32com;
import win32com.client;
from win32com.client import selecttlb;
def attachExcelCOM():
makepyExe = r'python C:\Python25\Lib\site-packages\win32com\client\makepy.py';
typeList = selecttlb.EnumTlbs();
for tl in typeList:
if (re.match('^Microsoft.*Excel.*', tl.desc, re.IGNORECASE)):
makepyCmd = "%s -d \"%s\"" % (makepyExe, tl.desc);
os.system(makepyCmd);
# end if
# end for
# end def
def getSheetName(sheetNum):
try:
xl = win32com.client.Dispatch("Excel.Application");
wb = xl.Workbooks.Item(sheetNum);
except Exception, detail:
print 'There was a problem attaching to Excel, refreshing connect config...';
print Exception, str(detail);
attachExcelCOM();
try:
xl = win32com.client.Dispatch("Excel.Application");
wb = xl.Workbooks.Item(sheetNum);
except:
print 'Could not attach to Excel...';
sys.exit(-1);
# end try/except
# end try/except
wsName = wb.Name;
if (wsName == 'PERSONAL.XLS'):
return( None );
# end if
print 'The target worksheet is:';
print ' ', wsName;
print 'Is this correct? [Y/N]',;
answer = string.strip( sys.stdin.readline() );
answer = answer.upper();
if (answer != 'Y'):
print 'Sheet not identified correctly.';
return(None);
# end if
return( (wb, wsName) );
# end def
# -- Main --
sheetInfo = getSheetName(sheetNum);
if (sheetInfo == None):
print 'Sheet not found';
sys.exit(-1);
else:
(wb, wsName) = sheetInfo;
# end if
Running the following code:
import os
import datetime
import ftplib
currdate = datetime.datetime.now()
formatdate = currdate.strftime("%m-%d-%Y %H%M")
def log():
fqn = os.uname()[1]
ext_ip = urllib2.urlopen('http://whatismyip.org').read()
log = open ('/Users/admin/Documents/locatelog.txt','w')
log.write(str("Asset: %s " % fqn))
log.write(str("Checking in from IP#: %s" % ext_ip))
smush = str(fqn +' # ' + formatdate)
os.rename('/Users/admin/Documents/locatelog.txt','/Users/admin/Documents/%s.txt' % smush )
s = ftplib.FTP('10.7.1.71','username','password')
f = open('/Users/admin/Documents/%s.txt' % smush,'r')
s.storbinary("STOR /Users/admin/Documents/%s.txt" % smush,f)
Generates the following error:
ftplib.error_perm: 550 /Users/admin/Documents/678538.local # 02-24-2010 1301.txt: No such file or directory
I have a feeling something is amiss in this line :
s.storbinary("STOR /Users/admin/Documents/%s.txt" % smush,f)
678538 is the host I am testing on...using Mac OS X 10.5 and Python 2.5.1
Shouldn't it bef = open('/Users/admin/Documents/%s.txt' % smush,'r') ? notice the / in front of Users
If you dont put the first /, the script will think the path to the file is relative to the current directory (where the script is run from)
Edit:
I m not too familiar with Python (I wish) but shouldnt it be:
s.storbinary('STOR /Users/admin/Documents/%s.txt' % smush,f) ?
In your example, Python will treat your string as literal and you want to interpolate the value of smush with %s
Edit 2:
Does the directory /Users/admin/Documents/ exist on your server? If not, I think you will have to create them before copying anything. (Since the error message is about some files/folders missing).
You can create them yourself first. Run your script. If the file is copied successfully, then you can add the creation of the directories from within your script.
remove all spaces from file name .eg in smush = str(fqn +' # ' + formatdate), you are putting a space in front of and after "#". you path looks like
/Users/admin/Documents/something # something
and when you pass it to ftplib, it may have problem. another way is to try putting quotes, eg
s.storbinary("STOR '/Users/admin/Documents/%s.txt'" % smush,f)
Edit:
This version works: Problem was that I was writing to buffer, and not to file.
import os
import urllib2
import datetime
import ftplib
currdate = datetime.datetime.now()
formatdate = currdate.strftime("%m-%d-%Y-%H%M")
def log():
fqn = os.uname()[1]
ext_ip = urllib2.urlopen('http://whatismyip.org').read()
smush = str(fqn + formatdate)
s = ftplib.FTP('10.7.1.71','username','password')
f = open('/Users/admin/Documents/%s.txt' % smush,'w')
f.write(str("Asset: %s " % fqn))
f.write('\n')
f.write(str("Checking in from IP#: %s" % ext_ip))
f.write('\n')
f.write(str("On: %s" % formatdate))
f.close
f = open('/Users/admin/Documents/%s.txt' % smush,'rb')
s.storbinary('STOR %s.txt' % smush , f)
s.close
f.close