Traversing FTP listing

Traversing FTP listing - python

I am trying to to get all directories' name from an FTP server and store them in hierarchical order in a multidimensional list or dict
So for example, a server that contains the following structure:
/www/
mysite.com
images
png
jpg
at the end of the script, would give me a list such as
['/www/'
['mysite.com'
['images'
['png'],
['jpg']
]
]
]
I have tried using a recursive function like so:
def traverse(dir):
FTP.dir(dir, traverse)
FTP.dir returns lines in this format:
drwxr-xr-x 5 leavesc1 leavesc1 4096 Nov 29 20:52 mysite.com
so doing line[56:] will give me just the directory name(mysite.com). I use this in the recursive function.
But i cannot get it to work. I've tried many different approaches and can't get it to work. Lots of FTP errors as well (either can't find the directory - which is a logical issue, and sometimes unexpected errors returned by the server, which leaves no log and i can't debug)
bottom line question:
How to get a hierarchical directory listing from an FTP server?

Here is a naive and slow implementation. It is slow because it tries to CWD to each directory entry to determine if it is a directory or a file, but this works. One could optimize it by parsing LIST command output, but this is strongly server-implementation dependent.
import ftplib
def traverse(ftp, depth=0):
"""
return a recursive listing of an ftp server contents (starting
from the current directory)
listing is returned as a recursive dictionary, where each key
contains a contents of the subdirectory or None if it corresponds
to a file.
#param ftp: ftplib.FTP object
"""
if depth > 10:
return ['depth > 10']
level = {}
for entry in (path for path in ftp.nlst() if path not in ('.', '..')):
try:
ftp.cwd(entry)
level[entry] = traverse(ftp, depth+1)
ftp.cwd('..')
except ftplib.error_perm:
level[entry] = None
return level
def main():
ftp = ftplib.FTP("localhost")
ftp.connect()
ftp.login()
ftp.set_pasv(True)
print traverse(ftp)
if __name__ == '__main__':
main()

Here's a first draft of a Python 3 script that worked for me. It's much faster than calling cwd(). Pass in server, port, directory, username, and password as arguments. I left output as a list as an exercise for the reader.
import ftplib
import sys
def ftp_walk(ftp, dir):
dirs = []
nondirs = []
for item in ftp.mlsd(dir):
if item[1]['type'] == 'dir':
dirs.append(item[0])
else:
nondirs.append(item[0])
if nondirs:
print()
print('{}:'.format(dir))
print('\n'.join(sorted(nondirs)))
else:
# print(dir, 'is empty')
pass
for subdir in sorted(dirs):
ftp_walk(ftp, '{}/{}'.format(dir, subdir))
ftp = ftplib.FTP()
ftp.connect(sys.argv[1], int(sys.argv[2]))
ftp.login(sys.argv[4], sys.argv[5])
ftp_walk(ftp, sys.argv[3])

You're not going to like this, but "it depends on the server" or, more accurately, "it depends on the output format of the server".
Different servers can be set to display different output, so your initial proposal is bound to failure in the general case.
The "naive and slow implementation" above will cause enough errors that some FTP servers will cut you off (which is probably what happened after about 7 of them...).

If the server supports the MLSD command, then use the “a directory and its descendants” code from that answer.

If we are using Python look at:
http://docs.python.org/library/os.path.html (os.path.walk)
If there already is a good module for this, don't reinvent the wheel. Can't believe the post two spots above got two ups, anyway, enjoy.

Related

How do I create an automated test for my python script?

I am fairly new to programming and currently working on a python script. It is supposed to gather all the files and directories that are given as paths inside the program and copy them to a new location that the user can choose as an input.
import shutil
import os
from pathlib import Path
import argparse
src = [ [insert name of destination directory, insert path of file/directory that
should be copied ]
]
x = input("Please choose a destination path\n>>>")
if not os.path.exists(x):
os.makedirs(x)
print("Directory was created")
else:
print("Existing directory was chosen")
dest = Path(x.strip())
for pfad in src:
if os.path.isdir(pfad[1]):
shutil.copytree(pfad[1], dest / pfad[0])
elif os.path.isfile(pfad[1]):
pfad1 = Path(dest / pfad[0])
if not os.path.exists(pfad1):
os.makedirs(pfad1)
shutil.copy(pfad[1], dest / pfad[0])
else:
print("An error occured")
print(pfad)
print("All files and directories have been copied!")
input()
The script itself is working just fine. The problem is that I want write a test that automatically test the code each time I push it to my GitLab repository. I have been browsing through the web for quite some time now but wasnt able to find a good explanation on how to approach creating a test for a script like this.
I would be extremely thankful for any kind of feedback or hints to helpful resources.

First, you should write a test that you can run in command line.
I suggest you use the argparse module to pass source and destination directories, so that you can run thescript.py source_dir dest_dir without human interaction.
Then, as you have a test you can run, you need to add a .gitlab-ci.yml to the root of the project so that you can use the gitlab CI.
If you never used the gitlab CI, you need to start here: https://docs.gitlab.com/ee/ci/quick_start/
After that, you'll be able to add a job to your .gitlab-ci.yml, so that a runner with python installed will run the test. If you don't understad the bold terms of the previous sentence, you need to understant Gitlab CI first.

How to detect cycles in directory traversal

I am using Python on Ubuntu (Linux). I would also like this code to work on modern-ish Windows PCs. I can take or leave Macs, since I have no plan to get one, but I am hoping to make this code as portable as possible.
I have written some code that is supposed to traverse a directory and run a test on all of its subdirectories (and later, some code that will do something with each file, so I need to know how to detect links there too).
I have added a check for symlinks, but I do not know how to protect against hardlinks that could cause infinite recursion. Ideally, I'd like to also protect against duplicate detections in general (root: [A,E], A: [B,C,D], E: [D,F,G], where D is the same file or directory in both A and E).
My current thought is to check if the path from the root directory to the current folder is the same as the path being tested, and if it isn't, skip it as an instance of a cycle. However, I think that would take a lot of extra I/O or it might just retrace the (actually cyclic) path that was just created.
How do I properly detect cycles in my filesystem?
def find(self) -> bool:
if self._remainingFoldersToSearch:
current_folder = self._remainingFoldersToSearch.pop()
if not current_folder.is_symlink():
contents = current_folder.iterdir()
try:
for item in contents:
if item.is_dir():
if item.name == self._indicator:
potentialArchive = [x.name for x in item.iterdir()]
if self._conf in potentialArchive:
self._archives.append(item)
if self._onArchiveReadCallback:
self._onArchiveReadCallback(item)
else:
self._remainingFoldersToSearch.append(item)
self._searched.append(item)
if self._onFolderReadCallback:
self._onFolderReadCallback(item)
except PermissionError:
logging.info("Invalid permissions accessing folder:", exc_info=True)
return True
else:
return False

Get file location and names from Windows Camera

I was running into troubles using QCamera with focusing and other things, so I thought I can use the Camerasoftware served with Windows 10. Based on the thread of opening the Windows Camera I did some trials to aquire the taken images and use them for my program. In the documentation and its API I didn't find usable snippets (for me), so I created the hack mentioned below. It assumes that the images are in the target folder 'C:\\Users\\*username*\\Pictures\\Camera Roll' which is mentioned in the registry (See below), but I don't know if this is reliable or how to get the proper key name.
I don't think that this is the only and cleanest solution. So, my question is how to get taken images and open/close the Camera proper?
Actualy the function waits till the 'WindowsCamera.exe' has left the processlist and return newly added images / videos in the target folder
In the registry I found:
Entry: Computer\HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\User Shell Folders with key name {3B193882-D3AD-4eab-965A-69829D1FB59F}for the target folder. I don't think that this key is usable.
Working example of my hack:
import subprocess
import pathlib
import psutil
def check_for_files(path, pattern):
print(" check_for_files:", (path, pattern))
files = []
for filename in pathlib.Path(path).rglob(pattern):
files.append (filename)
return files
def get_Windows_Picture(picpath):
prefiles = check_for_files(picpath, '*.jpg')
x = subprocess.call('start microsoft.windows.camera:', shell=True)
processlist = [proc.info['name'] for proc in psutil.process_iter (['name'])]
while 'WindowsCamera.exe' in processlist:
processlist = [proc.info['name'] for proc in psutil.process_iter (['name'])]
postfiles = check_for_files(picpath, '*.jpg')
newfiles = []
for file in postfiles:
if file not in prefiles:
newfiles.append(str(file))
return newfiles
if __name__ == "__main__":
picpath = str (pathlib.Path ("C:/Users/*user*/Pictures/Camera Roll"))
images = get_Windows_Picture(picpath)
print("Images:", images)

The Camera Roll is a "known Windows folder" which means some APIs can retrieve the exact path (even if it's non-default) for you:
SHGetKnownFolderPath
SHGetKnownFolderIDList
SHSetKnownFolderPath
The knownfolderid documentation will give you the constant name of the required folder (in your case FOLDERID_CameraRoll). As you can see in the linked page, the default is %USERPROFILE%\Pictures\Camera Roll (It's the default, so this doesn't mean it's the same for everyone).
The problem in Python is that you'll need to use ctypes which can be cumbersome some times (especially in your case when you'll have to deal with GUIDs and releasing the memory returned by the API).
This gist gives a good example on how to call SHGetKnownFolderPath from Python with ctypes. In your case you'll only need the CameraRoll member in the FOLDERID class so you can greatly simplify the code.
Side note: Don't poll for the process end, just use the wait() function on the Popen object.

with open inside try - except block, too many files open?

Quite simply, I am cycling through all sub folders in a specific location, and collecting a few numbers from three different files.
def GrepData():
import glob as glob
import os as os
os.chdir('RUNS')
RUNSDir = os.getcwd()
Directories = glob.glob('*.*')
ObjVal = []
ParVal = []
AADVal = []
for dir in Directories:
os.chdir(dir)
(X,Y) = dir.split(sep='+')
AADPath = glob.glob('Aad.out')
ObjPath = glob.glob('fobj.out')
ParPath = glob.glob('Par.out')
try:
with open(os.path.join(os.getcwd(),ObjPath[0])) as ObjFile:
for line in ObjFile:
ObjVal.append(list([X,Y,line.split()[0]]))
ObjFile.close()
except(IndexError):
ObjFile.close()
try:
with open(os.path.join(os.getcwd(),ParPath[0])) as ParFile:
for line in ParFile:
ParVal.append(list([X,Y,line.split()[0]]))
ParFile.close()
except(IndexError):
ParFile.close()
try:
with open(os.path.join(os.getcwd(),AADPath[0])) as AADFile:
for line in AADFile:
AADVal.append(list([X,Y,line.split()[0]]))
AADFile.close()
except(IndexError):
AADFile.close()
os.chdir(RUNSDir)
Each file open command is placed in a try - except block, as in a few cases the file that is opened will be empty, and thus appending the line.split() will lead to an index error since the list is empty.
However when running this script i get the following error: "OSError: [Errno 24] Too Many open files"
I was under the impression that the idea of the "with open..." statement was that it took care of closing the file after use? Clearly that is not happening.
So what I am asking for is two things:
The answer to: "Is my understanding of with open correct?"
How can I correct whatever error is inducing this problem?
(And yes i know the code is not exactly elegant. The whole try - except ought to be a single object that is reused - but I will fix that after figuring out this error)

Try moving your try-except inside the with like so:
with open(os.path.join(os.getcwd(),ObjPath[0])) as ObjFile:
for line in ObjFile:
try:
ObjVal.append(list([X,Y,line.split()[0]]))
except(IndexError):
pass
Notes: there is no need to close your file manually, this is what with is for. Also, there is no need to use as os in your imports if you are using the same name.

"Too many open files" has nothing to do with writing semantically incorrect python code, and you are using with correctly. The key is the part of your error that says "OSError," which refers to the underlying operating system.
When you call open(), the python interpreter will execute a system call. The details of the system call vary a bit by which OS you are using, but on linux this call is open(2). The operating system kernel will handle the system call. While the file is open, it has an entry in the system file table and takes up OS resources -- this means effectively it is "taking up space" whilst it is open. As such the OS has a limit to the number of files that can be opened at any one time.
Your problem is that while you call open(), you don't call close() quickly enough. In the event that your directory structure requires you to have many thousands files open at once that might approach this cap, it can be temporarily changed (at least on linux, I'm less familiar with other OSes so I don't want to go into too many details about how to do this across platforms).

OS X: Determine Trash location for a given path

Simply moving the file to ~/.Trash/ will not work, as if the file os on an external drive, it will move the file to the main system drive..
Also, there are other conditions, like files on external drives get moved to /Volumes/.Trash/501/ (or whatever the current user's ID is)
Given a file or folder path, what is the correct way to determine the trash folder? I imagine the language is pretty irrelevant, but I intend to use Python

Based upon code from http://www.cocoadev.com/index.pl?MoveToTrash I have came up with the following:
def get_trash_path(input_file):
path, file = os.path.split(input_file)
if path.startswith("/Volumes/"):
# /Volumes/driveName/.Trashes/<uid>
s = path.split(os.path.sep)
# s[2] is drive name ([0] is empty, [1] is Volumes)
trash_path = os.path.join("/Volumes", s[2], ".Trashes", str(os.getuid()))
if not os.path.isdir(trash_path):
raise IOError("Volume appears to be a network drive (%s could not be found)" % (trash_path))
else:
trash_path = os.path.join(os.getenv("HOME"), ".Trash")
return trash_path
Fairly basic, and there's a few things that have to be done seperatly, particularly checking if the filename already exist in trash (to avoid overwriting) and the actual moving to trash, but it seems to cover most things (internal, external and network drives)
Update: I wanted to trash a file in a Python script, so I re-implemented Dave Dribin's solution in Python:
from AppKit import NSURL
from ScriptingBridge import SBApplication
def trashPath(path):
"""Trashes a path using the Finder, via OS X's Scripting Bridge.
"""
targetfile = NSURL.fileURLWithPath_(path)
finder = SBApplication.applicationWithBundleIdentifier_("com.apple.Finder")
items = finder.items().objectAtLocation_(targetfile)
items.delete()
Usage is simple:
trashPath("/tmp/examplefile")

Alternatively, if you're on OS X 10.5, you could use Scripting Bridge to delete files via the Finder. I've done this in Ruby code here via RubyCocoa. The the gist of it is:
url = NSURL.fileURLWithPath(path)
finder = SBApplication.applicationWithBundleIdentifier("com.apple.Finder")
item = finder.items.objectAtLocation(url)
item.delete
You could easily do something similar with PyObjC.

A better way is NSWorkspaceRecycleOperation, which is one of the operations you can use with -[NSWorkspace performFileOperation:source:destination:files:tag:]. The constant's name is another artifact of Cocoa's NeXT heritage; its function is to move the item to the Trash.
Since it's part of Cocoa, it should be available to both Python and Ruby.

In Python, without using the scripting bridge, you can do this:
from AppKit import NSWorkspace, NSWorkspaceRecycleOperation
source = "path holding files"
files = ["file1", "file2"]
ws = NSWorkspace.sharedWorkspace()
ws.performFileOperation_source_destination_files_tag_(NSWorkspaceRecycleOperation, source, "", files, None)

The File Manager API has a pair of functions called FSMoveObjectToTrashAsync and FSPathMoveObjectToTrashSync.
Not sure if that is exposed to Python or not.

Another one in ruby:
Appscript.app('Finder').items[MacTypes::Alias.path(path)].delete
You will need rb-appscript gem, you can read about it here

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Traversing FTP listing - python

If the server supports the MLSD command, then use the “a directory and its descendants” code from that answer.

If we are using Python look at: http://docs.python.org/library/os.path.html (os.path.walk) If there already is a good module for this, don't reinvent the wheel. Can't believe the post two spots above got two ups, anyway, enjoy.

Related

How do I create an automated test for my python script?

How to detect cycles in directory traversal

Get file location and names from Windows Camera

with open inside try - except block, too many files open?

OS X: Determine Trash location for a given path

Categories

Resources