Recursive download with pysftp

Recursive download with pysftp - python

I'm trying to fetch from SFTP with the following structure:
main_dir/
dir1/
file1
dir2/
file2
I tried to achieve this with commands below:
sftp.get_r(main_path + dirpath, local_path)
or
sftp.get_d(main_path + dirpath, local_path)
The local path is like d:/grabbed_files/target_dir, and the remote is like /data/some_dir/target_dir.
With get_r I am getting FileNotFound exception. With get_d I am getting empty dir (when target dir have files not dirs, it works fine).
I'm totally sure that directory exists at this path. What am I doing wrong?

This one works for me, but when you download directory it create full path locally.
pysftp.Connection.get_r()
I also created simple download and upload methods:
def download_r(sftp, outbox):
tmp_dir = helpers.create_tmpdir()
assert sftp.isdir(str(outbox))
assert pathlib.Path(tmp_dir).is_dir()
sftp.get_r(str(outbox), str(tmp_dir))
tmp_dir = tmp_dir / outbox
return tmp_dir
def upload_r(sftp, inbox, files):
assert sftp.isdir(str(inbox))
if pathlib.Path(files).is_dir():
logger.debug(list(files.iterdir()))
sftp.put_r(str(files), str(inbox))
else:
logger.debug('No files here.')

I didn't understand why it doesn't work so I ended with my own recursive solution:
def grab_dir_rec(sftp, dirpath):
local_path = target_path + dirpath
full_path = main_path + dirpath
if not sftp.exists(full_path):
return
if not os.path.exists(local_path):
os.makedirs(local_path)
dirlist = sftp.listdir(remotepath=full_path)
for i in dirlist:
if sftp.isdir(full_path + '/' + i):
grab_dir_rec(sftp, dirpath + '/' + i)
else:
grab_file(sftp, dirpath + '/' + i)

In the event that you want a context manager wrapper around pysftp that does this for you, here is a solution that is even less code (after you copy/paste the github gist) that ends up looking like the following when used
path = "sftp://user:password#test.com/path/to/file.txt"
# Read a file
with open_sftp(path) as f:
s = f.read()
print s
# Write to a file
with open_sftp(path, mode='w') as f:
f.write("Some content.")
The (fuller) example: http://www.prschmid.com/2016/09/simple-opensftp-context-manager-for.html
This context manager happens to have auto-retry logic baked in in the event you can't connect the first time around (which surprisingly happens more often than you'd expect in a production environment...).
Oh, and yes, this assumes you are only getting one file per connection as it will auto-close the ftp connection.
The context manager gist for open_sftp: https://gist.github.com/prschmid/80a19c22012e42d4d6e791c1e4eb8515

Related

Need help checking for multiple underscores in a collection of files - see code below

I am working on a project where I need to sort .jpg files and folders that contain .jpg files. I have other scripts that are functional which I intend to incorporate into this python script later. First though, I've implemented in the first script below to count the number of underscores in a file and take action based on the result and this works successfully. I need help on creating logic that will go through .jpg image files and if the files have more than one underscore the program will move the files into an error folder. Also any feedback on how to optimize this script would be greatly appreciated!
from pathlib import Path
import shutil, os, time, glob
timestr = time.strftime("%Y%m%d-%H%M%S")
folder = 'D:\\test\\testing'
working_folder = 'DitaTest1'
full_path = Path(os.path.join(folder, working_folder))
test_path = folder + '\\' + working_folder
for file_path in full_path.iterdir():
file_name = file_path.name
result = file_name.count('_')
if file_path.is_file():
os.chdir(test_path)
for file in glob.glob("*.jpg"):
dst=test_path+"\\"+file.replace(" ","_").replace(".jpg","") # .replace("Angle","").replace("Front","").replace("Side","")
os.mkdir(dst)
# print(dst)
shutil.move(file,dst)
elif result != 1:
if not file_path.is_file():
shutil.move(os.path.join(folder, working_folder, file_name), os.path.join(folder, working_folder + ' - dir-ERRORS_' + timestr, file_name))
else:
print('Ignored operation')

You need to explain more so that we can understand it better but from what I have read,
Your if logic seems to be wrong, if you want to check the number of underscores you shouldn't put that logic in elif. You should try sth like this instead.
for file_path in full_path.iterdir():
file_name = file_path.name
result = file_name.count('_')
if os.path.isdir(file_path):
pass
else:
if result == 1:
os.chdir(test_path)
for file in glob.glob("*.jpg"):
dst=test_path+"\\"+file.replace(" ","_").replace(".jpg","") # .replace("Angle","").replace("Front","").replace("Side","")
os.mkdir(dst)
# print(dst)
shutil.move(file,dst)
else:
shutil.move(os.path.join(folder, working_folder, file_name), os.path.join(folder, working_folder + ' - dir-ERRORS_' + timestr, file_name))
What this code does is, iterate over the folder and if it finds a folder it will just pass and when it finds a file it will check if result == 1. If it is it will move it to your desired folder, otherwise it will move it to the error folder. If I made a mistake let me know.

Python ftplib error 553

I am attempting to use python to connect to a server and upload some files from my local directory to /var/www/html but every time I try to do this I get this error:
Error: ftplib.error_perm: 553 Could not create file.
I have already did a chown and a chmod -R 777 to the path. I am using vsftpd and already set write enabled. Does anyone have any ideas?
Code:
ftp = FTP('ipaddress')
ftp.login(user='user', passwd = 'user')
ftp.cwd('/var/www/html')
for root, dirs, files in os.walk(path):
for fname in files:
full_fname = os.path.join(root, fname)
ftp.storbinary('STOR' + fname, open(full_fname, 'rb'))

I had a similar problem also getting the error 553: Could not create file. What (update: partially) solved it for me was changing this line from:
ftp.storbinary('STOR' + fname, open(full_fname, 'rb'))
to:
ftp.storbinary('STOR ' + '/' + fname, open(full_fname, 'rb'))
Notice that there is a space just after the 'STOR ' and I added a forward slash ('/') just before the filename to indicate that i'd like the file stored in the FTP root directory
UPDATE: [2016-06-03]
Actually this only solved part of the problem. I realized later that it was a permissions problem. The FTP root directory allowed writing by the FTP user but i had manually created folders within this directory using another user thus the new directories did not allow the FTP user to write to these directories.
Possible solutions:
Change the permissions on the directories such that the FTP user is
the owner of these directories, or is at least able to read and
write to them.
Create the directories using the ftp.mkd(dir_name) function, then change directory using the ftp.cwd(dir_name) function and
then use the appropriate STOR function (storlines or storbinary)
to write the file to the current directory.
As far as my understanding goes, the STOR command seems to only take a filename as a parameter (not a file path), that's why you need to make sure you are in the correct 'working directory' before using the STOR function (Remember the space after the STOR command)
ftp.storbinary('STOR ' + fname, open(full_fname, 'rb'))

Does path == '/var/www/html'? That's a local path. You need an FTP path.
The local path /var/www/html is not generally accessible by FTP. When you connect to the FTP server, the file system presented to you begins at, often, your user's home directory /home/user.
Since it sounds like you're running the ftp server (vsftpd) on the remote machine, the simplest solution might be something like:
user#server:~$ ln -s /var/www/html /home/user/html
Then you could call ftp.cwd('html'), and ftp.nlst() to get the remote directory listing, and navigate it from there.
Also, don't forget to put a space character in the 'STOR' string (should be 'STOR ').
Best of luck!

I'm sure at this point you have found a solution, but I just stumbled across this thread while I was looking for a solution. I ended up using the following:
# Handles FTP transfer to server
def upload(ftp, dir, file):
# Test if directory exists. If not, create it
if dir.split('/')[-1] not in ftp.nlst('/'.join(dir.split('/')[:-1])):
print("Creating directory: " + dir)
ftp.mkd(dir)
# Check if file extension is text format
ext = path.splitext(file)[1]
if ext.lower() in (".txt", ".htm", ".html"):
ftp.storlines("STOR " + dir + '/' + file, open(dir + '/' + file, "rb"))
else:
ftp.storbinary("STOR " + dir + '/' + file, open(dir + '/' + file, "rb"), 1024)

Python - Watch a folder for new .zip file and upload via FTP

I am working on creating a script to watch a folder, grab any new .zip files, and then upload them via FTP to a predetermined area. Right now FTP testing is being performed Locally, since the environment isnt yet created.
The strategy I am taking is to first, unzip into a local folder. Then, perform ftplib.storbinary , on the file from the local folder, to the ftpdestination. However, the unzipping process appears to be working but I am getting a "file does not exist" error, all though I can see it in the folder itself.
Also, is there anyway to unzip directly into an FTP location? I havent been able to find a way hence the approach I am taking.
Thanks, local ftp info removed from code. All paths that are relevant in this code will be changed, most likely to dynamic fashion, but for now this is a local environment
extractZip2.py
import zipfile
import ftplib
import os
import logging
import time
from socket import error as socket_error
#Logging Setup
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger('__name__')
FTPaddress = ''
FTPusername = ''
FTPpassword = ''
ftp_destination_location = ''
path_to_watch = "C:/Users/206420055/Desktop/test2/"
before = dict ([(f,None) for f in os.listdir(path_to_watch)])
temp_destination_location = "C:/Users/206420055/Desktop/temp/"
def unzip(fullPath,temporaryPath):
with zipfile.ZipFile(fullPath, "r") as z :
logger.info("Unzipping {0}".format(fullPath))
z.extractall(temporaryPath)
logger.info("Unzipped into local directory {0}".format(temp_destination_location))
def check_or_create_ftp(session, folder):
"""
Checks to see if necessary folder for currenttab is available.
Creates the folder if not found, and enters it.
"""
if folder not in session.nlst():
logger.info('Directory for {0} does not exist, creating directory\n'.format(folder))
session.mkd(folder)
session.cwd(folder)
def check_or_create(temp_destination):
"""
Checks to see if local savepath exists. Will create savepath if not exists.
"""
if not os.path.exists(temp_destination):
logger.info('Directory for %s does not exist, creating directory\n' % temp_destination)
os.makedirs(str(temp_destination))
def transfer(address,username,password,filename,destination):
logger.info("Creating Session")
try:
session = session_init(address,username,password,destination)
except (socket_error,ftplib.error_perm) as e:
logger.error(str(e))
logger.error("Error in Session Init")
else:
try:
logger.info("Sending File {0}".format(filename))
send_file(session,filename)
except (IOError, OSError, ftplib.error_perm) as e:
logger.error(e)
def session_init(address,username,password,path):
session = ftplib.FTP(address,username,password)
check_or_create_ftp(session,path)
logger.info("Session Established")
return session
def send_file(session,filename):
file = open(filename,'rb')
logger.info('Sending File : STOR '+filename)
session.storbinary('STOR '+ filename, file)
file.close()
def delete_local_files(savepath, file):
logger.info("Cleaning Up Folder {0}".format(savepath))
os.remove(file)
while 1:
time.sleep(5)
after = dict ([(f,None) for f in os.listdir(path_to_watch)])
added = [f for f in after if not f in before]
removed = [f for f in before if not f in after]
if added: print "Added: ",", ".join(added)
before = after
check_or_create(temp_destination_location)
if added :
for file in added:
print file
if file.endswith('.zip'):
unzip(path_to_watch+file, temp_destination_location)
temp_files = os.listdir(temp_destination_location)
print("Temp Files {0}".format(temp_files))
for tf in temp_files:
print("TF {0}".format(tf))
transfer(FTPaddress,FTPusername,FTPpassword,tf,ftp_destination_location)
#delete_local_files(temp_destination_location,tf)
else:
pass
edit: adding error image
Seen above, we see the file in the temp folder. But the console obviously shows the error.

just change it to
from glob import glob
zips_in_path = dict ([(f,None) for f in glob("{base_path}/*.zip".format(base_path = path_to_watch)])
os.listdir does not include the path_to_watch part of the path it is just the filenames, however glob does.
so you could also do
after = dict ([(os.path.join(path_to_watch,f),None) for f in os.listdir(path_to_watch)])
using either of these methods you should be able to get the full path to the files in the path

recursive file copying into subdirectory

I need to copy all the files and folders to the current folder to a subdirectory. What would be the best way to do so? I tried the following snippet but it fails as it fails if the destination directory already exists.
def copy(d=os.path.curdir):
dest = "t"
for i in os.listdir(d):
if os.path.isdir(i):
shutil.copytree(i, dest)
else:
shutil.copy(i, dest)
I have the feeling that the same task can be done in a better and easier manner. How do i do it?

I would never do it on python, but the following solution came to mind. It doesn't look simple, but it should work and can be simplified (haven't checked, sorry, no access to the computer now):
def copyDirectoryTree(directory, destination, preserveSymlinks=True):
for entry in os.listdir(directory):
entryPath = os.path.join(directory, entry)
if os.path.isdir(entryPath):
entrydest = os.path.join(destination, entry)
if os.path.exists(entrydest):
if not os.path.isdir(entrydest):
raise IOError("Failed to copy thee, the destination for the `" + entryPath + "' directory exists and is not a directory")
copyDirectoryTree(entrypath, entrydest, preserveSymlinks)
else:
shutil.copytree(entrypath, entrydest, preserveSymlinks)
else: #symlinks and files
if preserveSymlinks:
shutil.copy(entryPath, directory)
else:
shutil.copy(os.path.realpath(entryPath), directory)

See the code in http://docs.python.org/library/shutil.html, then tweak it a little (e.g. try: around os.makedirs(dst)).

To extend mamnun's answer,
If you want to use the direct call to the os, I'd advise using cp -r since you seem to want a recursive copy for directories.

Do you really need to use python? Because shutil functions cannot copy all file metadata and group permissions. Why don't you try built-in OS commands like cp in linux and xcopy in windows?
You can even try to run these commands from python
import os
os.system("cp file1 file2")
Hope this helps.

Here is my version of a recursive copy method for python, seems to work :)
def copy_all(fr, to, overwrite=True):
fr = os.path.normpath(fr)
to = os.path.normpath(to)
if os.path.isdir(fr):
if (not os.path.exists(to + os.path.basename(fr)) and not
os.path.basename(fr) == os.path.basename(to)):
to += "/" + os.path.basename(fr)
mkdirs(to)
for file in os.listdir(fr):
copy_all(fr + "/" + file, to + "/")
else: #symlink or file
dest = to
if os.path.isdir(to):
dest += "/"
dest += os.path.basename(fr)
if overwrite and (os.path.exists(dest) or os.path.islink(dest)
rm(dest)
if os.path.isfile(fr):
shutil.copy2(fr, dest)
else: #has to be a symlink
os.symlink(os.readlink(fr), dest)
def mkdirs(path):
if not os.path.isdir(path):
os.makedirs(path)
def rm(path):
if os.path.isfile(path) or os.path.islink(path):
os.remove(path)
elif os.path.isdir(path):
for file in os.listdir(path):
fullpath = path+"/"+file
os.rmdir(fullpath)

Downloading a directory tree with ftplib

This will not download the contents of sub-directories; how can I do so?
import ftplib
import configparser
import os
directories = []
def add_directory(line):
if line.startswith('d'):
bits = line.split()
dirname = bits[8]
directories.append(dirname)
def makeDir(archiveTo):
for dir in directories:
newDir = os.path.join(archiveTo, dir)
if os.path.isdir(newDir) == True:
print("Directory \"" + dir + "\" already exists!")
else:
os.mkdir(newDir)
def getFiles(archiveTo, ftp):
files = ftp.nlst()
for filename in files:
try:
directories.index(filename)
except:
ftp.retrbinary('RETR %s' % filename, open(os.path.join(archiveTo, filename), 'wb').write)
def runBackups():
#Load INI
filename = 'connections.ini'
config = configparser.SafeConfigParser()
config.read(filename)
connections = config.sections()
i = 0
while i < len(connections):
#Load Settings
uri = config.get(connections[i], "uri")
username = config.get(connections[i], "username")
password = config.get(connections[i], "password")
backupPath = config.get(connections[i], "backuppath")
archiveTo = config.get(connections[i], "archiveto")
#Start Back-ups
ftp = ftplib.FTP(uri)
ftp.login(username, password)
ftp.cwd(backupPath)
#Map Directory Tree
ftp.retrlines('LIST', add_directory)
#Make Directories Locally
makeDir(archiveTo)
#Gather Files
getFiles(archiveTo, ftp)
#End connection and increase counter.
ftp.quit()
i += 1
print()
print("Back-ups complete.")
print()

this should do the trick :)
import sys
import ftplib
import os
from ftplib import FTP
ftp=FTP("ftp address")
ftp.login("user","password")
def downloadFiles(path,destination):
#path & destination are str of the form "/dir/folder/something/"
#path should be the abs path to the root FOLDER of the file tree to download
try:
ftp.cwd(path)
#clone path to destination
os.chdir(destination)
os.mkdir(destination[0:len(destination)-1]+path)
print destination[0:len(destination)-1]+path+" built"
except OSError:
#folder already exists at destination
pass
except ftplib.error_perm:
#invalid entry (ensure input form: "/dir/folder/something/")
print "error: could not change to "+path
sys.exit("ending session")
#list children:
filelist=ftp.nlst()
for file in filelist:
try:
#this will check if file is folder:
ftp.cwd(path+file+"/")
#if so, explore it:
downloadFiles(path+file+"/",destination)
except ftplib.error_perm:
#not a folder with accessible content
#download & return
os.chdir(destination[0:len(destination)-1]+path)
#possibly need a permission exception catch:
with open(os.path.join(destination,file),"wb") as f:
ftp.retrbinary("RETR "+file, f.write)
print file + " downloaded"
return
source="/ftproot/folder_i_want/"
dest="/systemroot/where_i_want_it/"
downloadFiles(source,dest)

This is a very old question, but I had a similar need that i wanted to satisfy in a very general manner. I ended up writing my own solution that works very well for me. I've placed it on Gist here https://gist.github.com/Jwely/ad8eb800bacef9e34dd775f9b3aad987
and pasted it below in case i ever take the gist offline.
Example usage:
import ftplib
ftp = ftplib.FTP(mysite, username, password)
download_ftp_tree(ftp, remote_dir, local_dir)
The code above will look for a directory called "remote_dir" on the ftp host, and then duplicate the directory and its entire contents into the "local_dir".
It invokes the script below.
import ftplib
import os
def _is_ftp_dir(ftp_handle, name, guess_by_extension=True):
""" simply determines if an item listed on the ftp server is a valid directory or not """
# if the name has a "." in the fourth to last position, its probably a file extension
# this is MUCH faster than trying to set every file to a working directory, and will work 99% of time.
if guess_by_extension is True:
if name[-4] == '.':
return False
original_cwd = ftp_handle.pwd() # remember the current working directory
try:
ftp_handle.cwd(name) # try to set directory to new name
ftp_handle.cwd(original_cwd) # set it back to what it was
return True
except:
return False
def _make_parent_dir(fpath):
""" ensures the parent directory of a filepath exists """
dirname = os.path.dirname(fpath)
while not os.path.exists(dirname):
try:
os.mkdir(dirname)
print("created {0}".format(dirname))
except:
_make_parent_dir(dirname)
def _download_ftp_file(ftp_handle, name, dest, overwrite):
""" downloads a single file from an ftp server """
_make_parent_dir(dest)
if not os.path.exists(dest) or overwrite is True:
with open(dest, 'wb') as f:
ftp_handle.retrbinary("RETR {0}".format(name), f.write)
print("downloaded: {0}".format(dest))
else:
print("already exists: {0}".format(dest))
def _mirror_ftp_dir(ftp_handle, name, overwrite, guess_by_extension):
""" replicates a directory on an ftp server recursively """
for item in ftp_handle.nlst(name):
if _is_ftp_dir(ftp_handle, item):
_mirror_ftp_dir(ftp_handle, item, overwrite, guess_by_extension)
else:
_download_ftp_file(ftp_handle, item, item, overwrite)
def download_ftp_tree(ftp_handle, path, destination, overwrite=False, guess_by_extension=True):
"""
Downloads an entire directory tree from an ftp server to the local destination
:param ftp_handle: an authenticated ftplib.FTP instance
:param path: the folder on the ftp server to download
:param destination: the local directory to store the copied folder
:param overwrite: set to True to force re-download of all files, even if they appear to exist already
:param guess_by_extension: It takes a while to explicitly check if every item is a directory or a file.
if this flag is set to True, it will assume any file ending with a three character extension ".???" is
a file and not a directory. Set to False if some folders may have a "." in their names -4th position.
"""
os.chdir(destination)
_mirror_ftp_dir(ftp_handle, path, overwrite, guess_by_extension)

this is an alternative. you can try using ftputil package. You can then use it to walk the remote directories and get your files

Using ftp.mlsd() instead of ftp.nlst():
import sys
import ftplib
import os
from ftplib import FTP
def fetchFiles(ftp, path, destination, overwrite=True):
'''Fetch a whole folder from ftp. \n
Parameters
----------
ftp : ftplib.FTP object
path : string ('/dir/folder/')
destination : string ('D:/dir/folder/') folder where the files will be saved
overwrite : bool - Overwrite file if already exists.
'''
try:
ftp.cwd(path)
os.mkdir(destination[:-1] + path)
print('New folder made: ' + destination[:-1] + path)
except OSError:
# folder already exists at the destination
pass
except ftplib.error_perm:
# invalid entry (ensure input form: "/dir/folder/")
print("error: could not change to " + path)
sys.exit("ending session")
# list children:
filelist = [i for i in ftp.mlsd()]
print('Current folder: ' + filelist.pop(0)[0])
for file in filelist:
if file[1]['type'] == 'file':
fullpath = os.path.join(destination[:-1] + path, file[0])
if (not overwrite and os.path.isfile(fullpath)):
continue
else:
with open(fullpath, 'wb') as f:
ftp.retrbinary('RETR ' + file[0], f.write)
print(file[0] + ' downloaded')
elif file[1]['type'] == 'dir':
fetchFiles(ftp, path + file[0] + '/', destination, overwrite)
else:
print('Unknown type: ' + file[1]['type'])
if __name__ == "__main__":
ftp = FTP('ftp address')
ftp.login('user', 'password')
source = r'/Folder/'
dest = r'D:/Data/'
fetchFiles(ftp, source, dest, overwrite=True)
ftp.quit()

Using ftputil, a fast solution could be:
def download(folder):
for item in ftp.walk(folder):
print("Creating dir " + item[0])
os.mkdir(item[0])
for subdir in item[1]:
print("Subdirs " + subdir)
for file in item[2]:
print(r"Copying File {0} \ {1}".format(item[0], file))
ftp.download(ftp.path.join(item[0],file), os.path.join(item[0],file))

It is non-trivial at least. In the simplest case, you only assume you have files and directories. This isn't always the case, there are softlinks and hardlinks and Windows-style shortcut. Softlink and directory shortcut are particularly problematic since they make recursive directory possible, which would confuse naive-ly implemented ftp grabber.
How would you handle such recursive directory depends on your need; you might simply not follow softlinks or you might try to detect recursive links. Detecting recursive link is inherently tricky, you cannot do it reliably.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Recursive download with pysftp - python

Related

Need help checking for multiple underscores in a collection of files - see code below

Python ftplib error 553

Python - Watch a folder for new .zip file and upload via FTP

recursive file copying into subdirectory

Downloading a directory tree with ftplib

Categories

Resources