Python FTP get the most recent file by date - python

I am using ftplib to connect to an ftp site. I want to get the most recently uploaded file and download it. I am able to connect to the ftp server and list the files, I also have put them in a list and got the datefield converted. Is there any function/module which can get the recent date and output the whole line from the list?
#!/usr/bin/env python
import ftplib
import os
import socket
import sys
HOST = 'test'
def main():
try:
f = ftplib.FTP(HOST)
except (socket.error, socket.gaierror), e:
print 'cannot reach to %s' % HOST
return
print "Connect to ftp server"
try:
f.login('anonymous','al#ge.com')
except ftplib.error_perm:
print 'cannot login anonymously'
f.quit()
return
print "logged on to the ftp server"
data = []
f.dir(data.append)
for line in data:
datestr = ' '.join(line.split()[0:2])
orig-date = time.strptime(datestr, '%d-%m-%y %H:%M%p')
f.quit()
return
if __name__ == '__main__':
main()
RESOLVED:
data = []
f.dir(data.append)
datelist = []
filelist = []
for line in data:
col = line.split()
datestr = ' '.join(line.split()[0:2])
date = time.strptime(datestr, '%m-%d-%y %H:%M%p')
datelist.append(date)
filelist.append(col[3])
combo = zip(datelist,filelist)
who = dict(combo)
for key in sorted(who.iterkeys(), reverse=True):
print "%s: %s" % (key,who[key])
filename = who[key]
print "file to download is %s" % filename
try:
f.retrbinary('RETR %s' % filename, open(filename, 'wb').write)
except ftplib.err_perm:
print "Error: cannot read file %s" % filename
os.unlink(filename)
else:
print "***Downloaded*** %s " % filename
return
f.quit()
return
One problem, is it possible to retrieve the first element from the dictionary? what I did here is that the for loop runs only once and exits thereby giving me the first sorted value which is fine, but I don't think it is a good practice to do it in this way..

For those looking for a full solution for finding the latest file in a folder:
MLSD
If your FTP server supports MLSD command, a solution is easy:
entries = list(ftp.mlsd())
entries.sort(key = lambda entry: entry[1]['modify'], reverse = True)
latest_name = entries[0][0]
print(latest_name)
LIST
If you need to rely on an obsolete LIST command, you have to parse a proprietary listing it returns.
Common *nix listing is like:
-rw-r--r-- 1 user group 4467 Mar 27 2018 file1.zip
-rw-r--r-- 1 user group 124529 Jun 18 15:31 file2.zip
With a listing like this, this code will do:
from dateutil import parser
# ...
lines = []
ftp.dir("", lines.append)
latest_time = None
latest_name = None
for line in lines:
tokens = line.split(maxsplit = 9)
time_str = tokens[5] + " " + tokens[6] + " " + tokens[7]
time = parser.parse(time_str)
if (latest_time is None) or (time > latest_time):
latest_name = tokens[8]
latest_time = time
print(latest_name)
This is a rather fragile approach.
MDTM
A more reliable, but a way less efficient, is to use MDTM command to retrieve timestamps of individual files/folders:
names = ftp.nlst()
latest_time = None
latest_name = None
for name in names:
time = ftp.voidcmd("MDTM " + name)
if (latest_time is None) or (time > latest_time):
latest_name = name
latest_time = time
print(latest_name)
For an alternative version of the code, see the answer by #Paulo.
Non-standard -t switch
Some FTP servers support a proprietary non-standard -t switch for NLST (or LIST) command.
lines = ftp.nlst("-t")
latest_name = lines[-1]
See How to get files in FTP folder sorted by modification time.
Downloading found file
No matter what approach you use, once you have the latest_name, you download it as any other file:
with open(latest_name, 'wb') as f:
ftp.retrbinary('RETR '+ latest_name, f.write)
See also
Get the latest FTP folder name in Python
How to get FTP file's modify time using Python ftplib

Why don't you use next dir option?
ftp.dir('-t',data.append)
With this option the file listing is time ordered from newest to oldest. Then just retrieve the first file in the list to download it.

With NLST, like shown in Martin Prikryl's response,
you should use sorted method:
ftp = FTP(host="127.0.0.1", user="u",passwd="p")
ftp.cwd("/data")
file_name = sorted(ftp.nlst(), key=lambda x: ftp.voidcmd(f"MDTM {x}"))[-1]

If you have all the dates in time.struct_time (strptime will give you this) in a list then all you have to do is sort the list.
Here's an example :
#!/usr/bin/python
import time
dates = [
"Jan 16 18:35 2012",
"Aug 16 21:14 2012",
"Dec 05 22:27 2012",
"Jan 22 19:42 2012",
"Jan 24 00:49 2012",
"Dec 15 22:41 2012",
"Dec 13 01:41 2012",
"Dec 24 01:23 2012",
"Jan 21 00:35 2012",
"Jan 16 18:35 2012",
]
def main():
datelist = []
for date in dates:
date = time.strptime(date, '%b %d %H:%M %Y')
datelist.append(date)
print datelist
datelist.sort()
print datelist
if __name__ == '__main__':
main()

I don't know how it's your ftp, but your example was not working for me. I changed some lines related to the date sorting part:
import sys
from ftplib import FTP
import os
import socket
import time
# Connects to the ftp
ftp = FTP(ftpHost)
ftp.login(yourUserName,yourPassword)
data = []
datelist = []
filelist = []
ftp.dir(data.append)
for line in data:
col = line.split()
datestr = ' '.join(line.split()[5:8])
date = time.strptime(datestr, '%b %d %H:%M')
datelist.append(date)
filelist.append(col[8])
combo = zip(datelist,filelist)
who = dict(combo)
for key in sorted(who.iterkeys(), reverse=True):
print "%s: %s" % (key,who[key])
filename = who[key]
print "file to download is %s" % filename
try:
ftp.retrbinary('RETR %s' % filename, open(filename, 'wb').write)
except ftplib.err_perm:
print "Error: cannot read file %s" % filename
os.unlink(filename)
else:
print "***Downloaded*** %s " % filename
ftp.quit()

Related

Why is Python running a different script?

I'm in Windows and I have a script called csv.py, I recently installed Pandas, and created anotherscript.py. The only code I have under anotherscript.py is import pandas.
When I run py anotherscript.py all it is doing is running csv.py. I have renamed csv.py to something else and it is still getting called.
If I removed import pandas, it works. If I move anotherscript.py to a different folder it works fine. It looks like something is cached.
What am I missing???
anotherscript.py
import pandas
cmd call and output
C:\Users\*****>py anotherscript.py
0 634
1 Saturday, January 8, 2022
2 15:00 EST
.
.
.
<cal file created and uploaded>
csv.py
This script scrapes a webpage and creates a calendar file
import openpyxl
from openpyxl import load_workbook
from ics import Calendar, Event
from datetime import datetime
import pytz
from ftplib import FTP
import ftplib
import urllib.request
import requests
response = requests.post("urlRetrated")
with open('u7.xlsx', 'wb') as s:
s.write(response.content)
wb_obj = openpyxl.load_workbook('u7.xlsx')
worksheet = wb_obj.active
data = []
c = Calendar()
EST = pytz.timezone('US/Eastern')
for count, row_cells in enumerate(worksheet.iter_rows(min_row=2,values_only=True)):
for count, cell in enumerate(row_cells):
data.append(cell)
date_and_time = data[1] + " " + data[2].strip('EST ')
game_datetime = datetime.strptime(date_and_time, '%A, %B %d, %Y %H:%M')
if 'SoccerTeam' in data[3]:
data[3] = 'William'
if 'SoccerTeam' in data[5]:
data[5] = 'William'
game_title = data[3] + " Vs " + data[5]
game_location = data[6]
e = Event()
e.name = game_title
e.begin = game_datetime.replace(tzinfo=EST)
e.location = game_location
e.created = datetime.today()
c.events.add(e)
with open('marcos.ics', 'w', newline='') as f:
f.write(str(c))
f.close()
for index, value in enumerate(data):
print(index, value)
data = []
user = '****'
pas = '*****'
try:
ftp = ftplib.FTP('*****', user, pas)
print(ftp.getwelcome())
ftp.cwd('public_html')
file = open('will.ics','rb')
ftp.storbinary('STOR will.ics', file)
file.close()
ftp.quit()
except ftplib.error_perm as error:
if error:
print ('Login Failed')
Thanks

Copy files with their creation date in a specific range

'''I am currently trying to copy files from one folder to another folder using shutil but I can't seem to get it to work, the process is saying it has finished but nothing happens?'''
The current criteria I have added raw_input that lets the user choose file extension.
The next criteria I am looking to add is a date range function so I can choose a date range for example:
17/07/2020 to 04/08/2020 or the day's date.
*UPDATED CODE
import os
import shutil
import os.path, time
from pip._vendor.distlib.compat import raw_input
os.chdir('C://')
src = ("C:/Users/eldri/OneDrive/Desktop/")
dst = ("C:/Users/eldri/OneDrive/Desktop/output")
ext = raw_input("[+] File format: ")
created = (" last modified: %s" % time.ctime(os.path.getmtime(src)))
start = raw_input("[+] Date start: ")
end = raw_input("[+] Date end: ")
def date_to_num(date):
return int("".join(date.split('/')[::-1]))
def date_in_range(date, start, end):
return date_to_num(date) > date_to_num(start) and date_to_num(date) < date_to_num(end)
for filename in os.listdir(src):
if filename.endswith('.'+ext) and created.startswith(start) and created.endswith(end):
shutil.copy( src + filename, dst)
print("[+] File transferred "+filename + created)
else:
print("[+] File not transferred "+filename + created)
print("[+] Transfer complete")
I was looking at maybe pandas? but not sure as still quite new to python.
example on terminal
file extension = .csv
startdate = 12/05/2020
enddate = 07/08/2020
once the user has input these fields it would copy only the required files over.
The current output of the created files are:
[+] File transferred BASE1011.xls last modified: Fri Jul 17 10:11:40 2020
[+] File transferred BASE1112.xls last modified: Fri Jul 17 10:11:40 2020
[+] File transferred BASE1213.xls last modified: Fri Jul 17 10:11:40 2020
[+] File transferred BASE1314.xls last modified: Fri Jul 17 10:11:40 2020
[+] File transferred BASE1415.xls last modified: Fri Jul 17 10:11:40 2020
I want these to be in an easier format for user input as explained above:
example: start 12/05/2020 end date = 07/08/2020
Thank you for your help, I am not the best at python but I am trying to learn so any help would be amazing.
Thanks
I've carried on your work using time.ctime(os.path.getmtime(src)) and created a function dateRange(createdDate, startDate, endDate) that uses datetime to convert the strings into datetime objects and returns True or False if the created date falls between start and end dates
import os
import shutil
import time
from datetime import datetime
src = "C:/Users/eldri/OneDrive/Desktop/"
dst = "C:/Users/eldri/OneDrive/Desktop/output"
ext = input("[+] File format: ") # "txt"
start = input("[+] Date start: ") # "01/07/2020"
end = input("[+] Date end: ") # "30/07/2020"
def dateRange(createdDate, startDate, endDate):
"""determines if date is in range"""
createdDate = datetime.strptime(createdDate, '%a %b %d %H:%M:%S %Y')
startDate = datetime.strptime(startDate, '%d/%m/%Y')
endDate = datetime.strptime(endDate, '%d/%m/%Y')
return startDate < createdDate < endDate
for filename in os.listdir(src):
created = time.ctime(os.path.getmtime(src + filename))
if filename.endswith('.' + ext) and dateRange(created, start, end):
shutil.copy(src + filename, dst)
print("[+] File transferred " + filename + created)
else:
print("[+] File not transferred " + filename + created)
print("[+] Transfer complete")
I've added examples at the end of the ext, start, and end variables to provide an idea of the format
For specific range you can:
create a function that parse the date to a number:
def date_to_num(date):
return int("".join(date.split('/')[::-1]))
def date_in_range(date, start, end):
return date_to_num(date) > date_to_num(start) and date_to_num(date) < date_to_num(end)
And then use it like that:
date_in_range("03/02/2020", "01/01/2020", "05/05/2020")

How to move only new or updated files?

I am trying to create a script that will move only new or updated files from the past 24 hours into a new folder. I created a script so far that will move files in general, any leads or suggestions would be greatly appreciated.
import os, shutil
source = os.listdir('C:\Users\Student\Desktop\FolderA')
destination = 'C:\Users\Student\Desktop\FolderB'
os.chdir('C:\Users\Student\Desktop\FolderA')
for files in os.listdir("C:\Users\Student\Desktop\FolderA"):
if files.endswith(".txt"):
src = os.path.join("C:\Users\Student\Desktop\FolderA",files)
dst = os.path.join(destination,files)
shutil.move(src,dst)
I believe I found a solution, let me know what you guys think.
# copy files from folder_a to folder_b
# if the files in folder_a have been modified within the past 24 hours
# copy them to folder_b
#
import shutil
import os
from os import path
import datetime
from datetime import date, time, timedelta
def file_has_changed(fname):
# print 'in file_has_changed with file : %s' % fname
# print str(path.getmtime(fname))
# get file modified time
file_m_time = datetime.datetime.fromtimestamp(path.getmtime(fname))
# print datetime.datetime.now()
# print file_m_time
#get the delta between today and filed mod time
td = datetime.datetime.now() - file_m_time
# print td
# print 'days : %d' % td.days
# file can be archived if mod within last 24 hours
if td.days == 0:
global ready_to_archive
ready_to_archive = ready_to_archive + 1
return True
else: return False
def main():
global ready_to_archive
global archived
ready_to_archive, archived = 0, 0
# src = "c:\users\gail\desktop\foldera"
# dst = "c:\users\gail\desktop\folderb"
for fname in os.listdir('c:\users\gail\Desktop\FolderA'):
src_fname = 'c:\users\gail\Desktop\FolderA\%s' % fname
if file_has_changed(src_fname):
dst_fname = 'c:\users\gail\Desktop\FolderB\%s' % fname
dst_folder = 'c:\users\gail\Desktop\FolderB'
try:
shutil.copy2(src_fname, dst_folder)
global archived;
archived = archived + 1
# print 'Copying file : %s ' % (src_fname)
# print ' To loc : %s ' % (dst_fname)
except IOError as e:
print 'could not open the file: %s ' % e
if __name__ == "__main__":
main()
print '****** Archive Report for %s ******' % datetime.datetime.now()
print '%d files ready for archiving ' % ready_to_archive
print '%d files archived' % archived
print '****** End of Archive Report ******'

Capture files that have been modified in the past x days in Python

I'm using the below script to re-encode my existing media files to MP4 using the HandBrake CLI. It's going to be a long process, so I'd like to have a way to capture files that have been created in the past 7 days, as well as the other filters (on file extensions), so that new content can be updated, while older content can be run on a separate script at different times. What do I have to change in the script to only capture files created in the past 7 days?
import os
import time
import subprocess
import sys
import httplib
import urllib
from xml.dom import minidom
import logging
import datetime
#Script Directory setup
myDateTime = datetime.datetime.now().strftime("%y-%m-%d-%H-%M")
logdir = 'D:\\logs\\'
logfile = logdir + 'TV_encode-' + myDateTime + '.log'
#Log Handler Setup
logger = logging.getLogger('TV_encode')
hdlr = logging.FileHandler(logfile)
formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')
hdlr.setFormatter(formatter)
logger.addHandler(hdlr)
logger.setLevel(logging.INFO)
logger.info('Getting list of files to re-encode...')
fileList = []
rootdir = 'T:\\'
logger.info('Using %s as root directory for scan...' % rootdir)
for root, subFolders, files in os.walk(rootdir):
for file in files:
theFile = os.path.join(root,file)
fileName, fileExtension = os.path.splitext(theFile)
if fileExtension.lower() in ('.avi', '.divx', '.flv', '.m4v', '.mkv', '.mov', '.mpg', '.mpeg', '.wmv'):
print 'Adding',theFile
logger.info('Adding %s to list of file to re-encode.' % theFile)
fileList.append(theFile)
runstr = '"C:\\Program Files\\Handbrake\\HandBrakeCLI.exe" -i "{0}" -o "{1}" --preset="Normal" --two-pass --turbo'
print '=======--------======='
logger.info('=======--------=======')
logger.info('Starting processing of files...')
while fileList:
inFile = fileList.pop()
logger.info('Original file: %s' % inFile)
fileName, fileExtension = os.path.splitext(inFile)
outFile = fileName+'.mp4'
logger.info('New file: %s' % outFile)
print 'Processing',inFile
logger.info('Processing %s' % inFile)
returncode = subprocess.call(runstr.format(inFile,outFile))
time.sleep(5)
print 'Removing',inFile
logger.info('Removing %s' % inFile)
os.remove(inFile)
logger.info('Sending Pushover notification...')
conn = httplib.HTTPSConnection("api.pushover.net:443")
conn.request("POST", "/1/messages.json",
urllib.urlencode({
"token": "TOKENHERE",
"user": "USERKEY",
"message": "Re-encoding complete for %s" % fileName,
}), {"Content-type": "application/x-www-form-urlencoded"})
conn.getresponse()
os.path.getmtime(filename) will give you the modification time in seconds since the epoch.
Use the datetime module to convert it to a datetime object, and compare it as usual.
import datetime
import os
ONE_WEEK_AGO = datetime.datetime.today() - datetime.timedelta(days=7)
mod_date = datetime.datetime.fromtimestamp(os.path.getmtime(theFile))
if mod_date > ONE_WEEK_AGO:
# use the file.

How to append a file's creation date to its filename?

I would like to create a python script that
appends the file created date to the end of the filename while retaining the oringinal file name (Report) for a batch of pdf documents.
directory = T:\WISAARD_Web Portal Projects\PortalLogging\WebLogExpert
filenames = Report.pdf
import os,time
root="/home"
path=os.path.join(root,"dir1")
os.chdir(path)
for files in os.listdir("."):
if files.endswith(".pdf"):
f,ext = os.path.splitext(files)
d=time.ctime(os.path.getmtime(files)).split() #here is just example. you can use strftime, strptime etc to format your date as desired
filedate = d[-1]+"-"+d[-2]+"-"+d[-3]
newname = f+filedate+ext
try:
os.rename(files,newname)
except Exception,e:
print e
else:
print "ok: renamed %s to %s " %(files,newname)

Categories

Resources