Navigating a "directory" structure in a text file

Navigating a "directory" structure in a text file - python

I am making a Python script which will allow, among other things, downloading files from an S3 filestore. I'm using the boto module to do this. As a first step, I get a list of files in a user-specified bucket. I'm storing that list in a temporary text file. Although S3 doesn't really have directories, we fake it the same way as everyone else by prepending a fake path to the filename. So, suppose I have the following in my bucket:
2015-04-12/logs/east/01.gz
2015-04-12/logs/east/02.gz
2015-04-12/logs/west/01.gz
2015-04-12/logs/west/02.gz
2015-04-12/summary
2015-04-13/logs/east/01.gz
2015-04-13/logs/east/02.gz
2015-04-13/logs/west/01.gz
2015-04-13/logs/west/02.gz
2015-04-13/summary
README
This is a very, very short version of the file. The real one is about 35,000 lines, so it needs to be presented to the user in a manageable way. I'm looking for suggestions on how to go about this. The way I've attempted has worked well, except that it assumed that everything would share a common directory path length. As you can see, that's no longer true. I'm assured that more variations will be coming, so I'd like to accommodate essentially arbitrary directory/file structures.
My method was, in effect, to extract the leftmost part of each path (that is, the top-level directory), create a uniq'd list of those, and present that to the user to choose. Then, when they choose, take everything starting with their choice and extract the second part of the path (if it existed), uniq those and present them to the user. When they choose, concatenate their first selection, a /, and their second selection, and repeat until there's no more path left. This is unwieldy and it's hard to say, for example, "this directory contains both files and directories."
How would you go about this? I'm having a hard time wrapping my head around this without creating an awkward presentation and spaghettified code. Thank you.

If I understand your question correctly, you want to be able to "drill down" into a list of path-like strings, correct?
If so, I'd suggest the newer pathlib module in the standard library. The code I'll show allows you to do something like this:
Current path:
1: 2015-04-12/
2: 2015-04-13/
3: README
? 2
Current path: 2015-04-13
1: logs/
2: summary
? 1
Current path: 2015-04-13/logs
1: east/
2: west/
? 2
Current path: 2015-04-13/logs/west
1: 01.gz
2: 02.gz
? 1
You have selected: 2015-04-13/logs/west/01.gz
Now for the code... First, we import pathlib and convert our list of strings to a list of pathlib.Path objects:
import pathlib
paths = (
"""
2015-04-12/logs/east/01.gz
2015-04-12/logs/east/02.gz
2015-04-12/logs/west/01.gz
2015-04-12/logs/west/02.gz
2015-04-12/summary
2015-04-13/logs/east/01.gz
2015-04-13/logs/east/02.gz
2015-04-13/logs/west/01.gz
2015-04-13/logs/west/02.gz
2015-04-13/summary
README""").split()
paths = [pathlib.Path(p) for p in paths]
Now I'll want to make some helper functions. First is a menu function that asks the user to select an entry from a list of choices. This will return an element of the list:
def menu(choices):
for i, choice in enumerate(choices, start=1):
message = '{}: {}'.format(i, choice)
print(message)
while True:
try:
selection = choices[int(input('? ')) - 1]
except (ValueError, IndexError):
message = 'Invalid selection: must be between 1 and {}.'
print(message.format(len(choices)))
else:
return selection
We'll need a list of choices to give to that function, so we'll make a path_choices function which does as much. We give this function a container of full paths and the current path the the user has selected. It then returns the "next steps" that the user can take. For example, if we have a list of possibilities: ['foo/apple', 'foo/banana/one.txt', 'foo/orange/pear/summary.txt'], and curpath is foo, then this function will return {'apple', 'banana/', 'orange/'}. Note that the directories have trailing slashes, which is nice.
def path_choices(possibilities, curpath):
choices = set()
for path in possibilities:
parts = path.relative_to(curpath).parts
root = parts[0]
if len(parts) > 1:
root += '/'
choices.add(root)
return choices
Lastly, we'll have a simple function to filter a container of paths, only returning paths which start with curpath, and which aren't in fact equal to curpath:
def filter_paths(possibilities, curpath):
for path in possibilities:
if path != curpath and str(path).startswith(str(curpath)):
yield path
After this, it's just a matter of gluing these functions together:
curpath = ''
possibilities = paths
while possibilities:
print('Current path: {}'.format(curpath))
choices = sorted(path_choices(possibilities, curpath))
selection = menu(choices)
if curpath:
curpath /= selection
else:
curpath = pathlib.Path(selection)
possibilities = list(filter_paths(possibilities, curpath))
print()
print('You have selected: ', curpath)

Related

Python join directory paths

I have problems with os.path.join() because it never joins complete path.
Code is:
get_base_dir = (os.getenv('BUILD_DIRECTORY'))
base_dir_path = pathlib.Path(get_base_dir)
print (base_dir_path ) # output is: F:\file\temp\ - which is correct
s_dir = '/sw/folder1/folder2/'
s_dir_path = pathlib.Path(s_dir)
print (s_dir_path) # output is: \sw\folder1\folder2\
full_path = os.path.join(base_dir_path, s_dir_path)
print (full_path) # output is: F:\\sw\\folder1'\\folder2 instead of F:\\file\\temp\\sw\\folder1'\\folder2
Anyone has idea of what goes wrong?

This behavior is compliant with what os.path.join docs states
Join one or more path components intelligently. The return value is
the concatenation of path and any members of *paths with exactly one
directory separator following each non-empty part except the last,
meaning that the result will only end in a separator if the last part
is empty. If a component is an absolute path, all previous components
are thrown away and joining continues from the absolute path
component.
(I added emphasis)

Return strings based on folders in directory

I'm trying to write a code which will look in to a specified directory and return the names of the folders in there. I need them to come in separate strings, instead of in one line, so that I can then use them to create new folders based on those names the code returned.
So far this is what I have:
def Lib_Folder():
my_list = os.listdir('/Users/Tim/Test/Test Library')
if '.DS_Store' in my_list:
my_list.remove('.DS_Store')
return str(my_list).replace('[','').replace(']','').replace("'", "")
Library_Folder = '%s' % ( Lib_Folder() )
print Library_Folder
and it returns this
# Result: testfolder1, testfolder2
What I would like it to return is
testfolder1
testfolder2
Does anyone know how I can achieve this?

It's best to process the values as is, and not try to process the string representation of a Python list, there are lots of little edge cases if you go that route that will cause problems.
Also, your description mentions only wanting the directories in the target folder, you're not checking for that, you will output the files as well.
And I find this is a preference thing, but I prefer to avoid building up lists in memory if I can avoid it. It really doesn't matter for a handful of directory names, but it's a habit I like to get into, so this solution shows how to use a generator:
import os
def Lib_Folder():
target = '/Users/Tim/Test/Test Library'
for cur in os.listdir(target):
# Check to see if the current item is a directory
if os.path.isdir(os.path.join(target, cur)):
# Ignore this folder if it exists
if cur != '.DS_Store':
# It's a good folder, return it
yield cur
# Print out each item as it occurs
for cur in Lib_Folder():
print(cur)
# Or, print them out as one string:
values = "\n".join(Lib_Folder())
print(values)

Yeah, there's a much easier way to do this. Don't try to manipulate the string that represents the list. Work with the list itself. Use str.join to separate each entry with newlines.
def Lib_Folder():
my_list = os.listdir('/Users/Tim/Test/Test Library')
if '.DS_Store' in my_list:
my_list.remove('.DS_Store')
return "\n".join(my_list) # Puts each on its own line.
Library_Folder = Lib_Folder() # The %s thing you were doing is unnecessary.
# Let's check whether it worked!
print(Library_Folder)
# testfolder1
# testfolder2

With Python how can I replace part of a path based on a piece that matches?

I have a path that looks like: data/dev-noise-subtractive-250ms-1/1988/24833/1988-24833-0013.flac
What I want to do is replace the second part, so that it's data/dev-clean/1988/24833/1988-24833-0013.flac. I can't guarantee anything about the second section, other than it starts with dev-.
I need to make it general-purpose, so that it'll work with any arbitrary stem, such as train-, and so on.

You can use re to match and replace it by:
def func(pattern, file):
return re.sub(f'{pattern}[^/]+/', f'{pattern}clean/', file)
func('dev-', 'data/dev-noise-subtractive-250ms-1/1988/24833/1988-24833-0013.flac')
#data/dev-clean/1988/24833/1988-24833-0013.flac
func('train-', 'data/train-noise-subtractive-250ms-1/1988/24833/1988-24833-0013.flac')
#data/train-clean/1988/24833/1988-24833-0013.flac
func('train-', 'data/xxx/xxx/train-noise-subtractive-250ms-1/1988/24833/1988-24833-0013.flac')
#data/xxx/xxx/train-clean/1988/24833/1988-24833-0013.flac

def replace_second_part(path, with_what):
parts = path.split('/')
parts[1] = with_what
return '/'.join(parts)
If you want this to be more portable (work under Windows, for example) using os.path is preferred.

Incremental Saves

I am trying to write up a script on incremental saves but there are a few hiccups that I am running into.
If the file name is "aaa.ma", I will get the following error - ValueError: invalid literal for int() with base 10: 'aaa' # and it does not happens if my file is named "aaa_0001"
And this happens if I wrote my code in this format: Link
As such, to rectify the above problem, I input in an if..else.. statement - Link, it seems to have resolved the issue on hand, but I was wondering if there is a better approach to this?
Any advice will be greatly appreciated!

Use regexes for better flexibility especially for file rename scripts like these.
In your case, since you know that the expected filename format is "some_file_name_<increment_number>", you can use regexes to do the searching and matching for you. The reason we should do this is because people/users may are not machines, and may not stick to the exact naming conventions that our scripts expect. For example, the user may name the file aaa_01.ma or even aaa001.ma instead of aaa_0001 that your script currently expects. To build this flexibility into your script, you can use regexes. For your use case, you could do:
# name = lastIncFile.partition(".")[0] # Use os.path.split instead
name, ext = os.path.splitext(lastIncFile)
import re
match_object = re.search("([a-zA-Z]*)_*([0-9]*)$", name)
# Here ([a-zA-Z]*) would be group(1) and would have "aaa" for ex.
# and ([0-9]*) would be group(2) and would have "0001" for ex.
# _* indicates that there may be an _, or not.
# The $ indicates that ([0-9]*) would be the LAST part of the name.
padding = 4 # Try and parameterize as many components as possible for easy maintenance
default_starting = 1
verName = str(default_starting).zfill(padding) # Default verName
if match_object: # True if the version string was found
name = match_object.group(1)
version_component = match_object.group(2)
if version_component:
verName = str(int(version_component) + 1).zfill(padding)
newFileName = "%s_%s.%s" % (name, verName, ext)
incSaveFilePath = os.path.join(curFileDir, newFileName)
Check out this nice tutorial on Python regexes to get an idea what is going on in the above block. Feel free to tweak, evolve and build the regex based on your use cases, tests and needs.
Extra tips:
Call cmds.file(renameToSave=True) at the beginning of the script. This will ensure that the file does not get saved over itself accidentally, and forces the script/user to rename the current file. Just a safety measure.
If you want to go a little fancy with your regex expression and make them more readable, you could try doing this:
match_object = re.search("(?P<name>[a-zA-Z]*)_*(?P<version>[0-9]*)$", name)
name = match_object.group('name')
version_component = match_object('version')
Here we use the ?P<var_name>... syntax to assign a dict key name to the matching group. Makes for better readability when you access it - mo.group('version') is much more clearer than mo.group(2).
Make sure to go through the official docs too.
Save using Maya's commands. This will ensure Maya does all it's checks while and before saving:
cmds.file(rename=incSaveFilePath)
cmds.file(save=True)
Update-2:
If you want space to be checked here's an updated regex:
match_object = re.search("(?P<name>[a-zA-Z]*)[_ ]*(?P<version>[0-9]*)$", name)
Here [_ ]* will check for 0 - many occurrences of _ or (space). For more regex stuff, trying and learn on your own is the best way. Check out the links on this post.
Hope this helps.

Exact match in strings in python

I am trying to find a sub-string in a string, but I am not achieving the results I want.
I have several strings that contains the direction to different directories:
'/Users/mymac/Desktop/test_python/result_files_Sample_8_11/logs',
'/Users/mymac/Desktop/test_python/result_files_Sample_8_1/logs',
'/Users/mymac/Desktop/test_python/result_files_Sample_8_9/logs'
Here is the part of my code here I am trying to find the exact match to the sub-string:
for name in sample_names:
if (dire.find(name)!=-1):
for files in os.walk(dire):
for file in files:
list_files=[]
list_files.append(file)
file_dict[name]=list_files
Everything works fine except that when it looks for Sample_8_1 in the string that contains the directory, the if condition also accepts the name Sample_8_11. How can I make it so that it makes an exact match to prevent from entering the same directory more than once?

You could try searching for sample_8_1/ (i.e., include the following slash). I guess given your code that would be dire.find(name+'/'). This just a quick and dirty approach.

Assuming that dire is populated with absolute path names
for name in sample_names:
if name in dire:
...
e.g.
samples = ['/home/msvalkon/work/tmp_1',
'/home/msvalkon/work/tmp_11']
dirs = ['/home/msvalkon/work/tmp_11']
for name in samples:
if name in dirs:
print "Entry %s matches" % name
Entry /home/msvalkon/work/tmp_11 matches

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.