Python join directory paths

Python join directory paths - python

I have problems with os.path.join() because it never joins complete path.
Code is:
get_base_dir = (os.getenv('BUILD_DIRECTORY'))
base_dir_path = pathlib.Path(get_base_dir)
print (base_dir_path ) # output is: F:\file\temp\ - which is correct
s_dir = '/sw/folder1/folder2/'
s_dir_path = pathlib.Path(s_dir)
print (s_dir_path) # output is: \sw\folder1\folder2\
full_path = os.path.join(base_dir_path, s_dir_path)
print (full_path) # output is: F:\\sw\\folder1'\\folder2 instead of F:\\file\\temp\\sw\\folder1'\\folder2
Anyone has idea of what goes wrong?

This behavior is compliant with what os.path.join docs states
Join one or more path components intelligently. The return value is
the concatenation of path and any members of *paths with exactly one
directory separator following each non-empty part except the last,
meaning that the result will only end in a separator if the last part
is empty. If a component is an absolute path, all previous components
are thrown away and joining continues from the absolute path
component.
(I added emphasis)

Related

Return strings based on folders in directory

I'm trying to write a code which will look in to a specified directory and return the names of the folders in there. I need them to come in separate strings, instead of in one line, so that I can then use them to create new folders based on those names the code returned.
So far this is what I have:
def Lib_Folder():
my_list = os.listdir('/Users/Tim/Test/Test Library')
if '.DS_Store' in my_list:
my_list.remove('.DS_Store')
return str(my_list).replace('[','').replace(']','').replace("'", "")
Library_Folder = '%s' % ( Lib_Folder() )
print Library_Folder
and it returns this
# Result: testfolder1, testfolder2
What I would like it to return is
testfolder1
testfolder2
Does anyone know how I can achieve this?

It's best to process the values as is, and not try to process the string representation of a Python list, there are lots of little edge cases if you go that route that will cause problems.
Also, your description mentions only wanting the directories in the target folder, you're not checking for that, you will output the files as well.
And I find this is a preference thing, but I prefer to avoid building up lists in memory if I can avoid it. It really doesn't matter for a handful of directory names, but it's a habit I like to get into, so this solution shows how to use a generator:
import os
def Lib_Folder():
target = '/Users/Tim/Test/Test Library'
for cur in os.listdir(target):
# Check to see if the current item is a directory
if os.path.isdir(os.path.join(target, cur)):
# Ignore this folder if it exists
if cur != '.DS_Store':
# It's a good folder, return it
yield cur
# Print out each item as it occurs
for cur in Lib_Folder():
print(cur)
# Or, print them out as one string:
values = "\n".join(Lib_Folder())
print(values)

Yeah, there's a much easier way to do this. Don't try to manipulate the string that represents the list. Work with the list itself. Use str.join to separate each entry with newlines.
def Lib_Folder():
my_list = os.listdir('/Users/Tim/Test/Test Library')
if '.DS_Store' in my_list:
my_list.remove('.DS_Store')
return "\n".join(my_list) # Puts each on its own line.
Library_Folder = Lib_Folder() # The %s thing you were doing is unnecessary.
# Let's check whether it worked!
print(Library_Folder)
# testfolder1
# testfolder2

How can I isolate a section of a path

I'm wanting to extract the path that contains the word "tk-nuke-writenode" in it in the example below.
I need to isolate that particular path and only that path. The path(s) below are not fixed so I can’t use the split function and select the "tk-nuke-writenode" path using a field (e.g. [2]). See example below:
NUKE_PATH = os.environ['NUKE_PATH']
Result:
'X:\pipeline\app_config\release\extensions\global\nuke;X:\pipeline\app_config\release\extensions\projects\sgtk\powerPlant\install\app_store\tk-nuke\v0.11.4\classic_startup\restart;X:/pipeline/app_config/release/extensions/projects/sgtk/powerPlant/install/app_store/tk-nuke-writenode/v1.4.1/gizmos'
NUKE_PATH.split(os.pathsep)[2]
Result:
'X:/pipeline/app_config/release/extensions/projects/sgtk/powerPlant/install/app_store/tk-nuke-writenode/v1.4.1/gizmos'
Wanted output:
'X:/pipeline/app_config/release/extensions/projects/sgtk/powerPlant/install/app_store/tk-nuke-writenode/v1.4.1/gizmos'
Thanks in advance for any help you can offer!

The paths are divided by ';' so you can use something like this:
NUKE_PATH = os.environ['NUKE_PATH']
l = NUKE_PATH.split(';')
result = filter(lambda path: "tk-nuke-writenode" in path, l)

Assuming your paths do not contain any semicolons, you can do
path = 'X:\pipeline\app_config\release\extensions\global\nuke;X:\pipeline\app_config\release\extensions\projects\sgtk\powerPlant\install\app_store\tk-nuke\v0.11.4\classic_startup\restart;X:/pipeline/app_config/release/extensions/projects/sgtk/powerPlant/install/app_store/tk-nuke-writenode/v1.4.1/gizmos'
matches = [p for p in path.split(';') if 'tk-nuke-writenode' in p]
matches[0]
'X:/pipeline/app_config/release/extensions/projects/sgtk/powerPlant/install/app_store/tk-nuke-writenode/v1.4.1/gizmos'
This will raise an exception if no matches are found, and it may be a good idea to handle multiple matches as well.

Navigating a "directory" structure in a text file

I am making a Python script which will allow, among other things, downloading files from an S3 filestore. I'm using the boto module to do this. As a first step, I get a list of files in a user-specified bucket. I'm storing that list in a temporary text file. Although S3 doesn't really have directories, we fake it the same way as everyone else by prepending a fake path to the filename. So, suppose I have the following in my bucket:
2015-04-12/logs/east/01.gz
2015-04-12/logs/east/02.gz
2015-04-12/logs/west/01.gz
2015-04-12/logs/west/02.gz
2015-04-12/summary
2015-04-13/logs/east/01.gz
2015-04-13/logs/east/02.gz
2015-04-13/logs/west/01.gz
2015-04-13/logs/west/02.gz
2015-04-13/summary
README
This is a very, very short version of the file. The real one is about 35,000 lines, so it needs to be presented to the user in a manageable way. I'm looking for suggestions on how to go about this. The way I've attempted has worked well, except that it assumed that everything would share a common directory path length. As you can see, that's no longer true. I'm assured that more variations will be coming, so I'd like to accommodate essentially arbitrary directory/file structures.
My method was, in effect, to extract the leftmost part of each path (that is, the top-level directory), create a uniq'd list of those, and present that to the user to choose. Then, when they choose, take everything starting with their choice and extract the second part of the path (if it existed), uniq those and present them to the user. When they choose, concatenate their first selection, a /, and their second selection, and repeat until there's no more path left. This is unwieldy and it's hard to say, for example, "this directory contains both files and directories."
How would you go about this? I'm having a hard time wrapping my head around this without creating an awkward presentation and spaghettified code. Thank you.

If I understand your question correctly, you want to be able to "drill down" into a list of path-like strings, correct?
If so, I'd suggest the newer pathlib module in the standard library. The code I'll show allows you to do something like this:
Current path:
1: 2015-04-12/
2: 2015-04-13/
3: README
? 2
Current path: 2015-04-13
1: logs/
2: summary
? 1
Current path: 2015-04-13/logs
1: east/
2: west/
? 2
Current path: 2015-04-13/logs/west
1: 01.gz
2: 02.gz
? 1
You have selected: 2015-04-13/logs/west/01.gz
Now for the code... First, we import pathlib and convert our list of strings to a list of pathlib.Path objects:
import pathlib
paths = (
"""
2015-04-12/logs/east/01.gz
2015-04-12/logs/east/02.gz
2015-04-12/logs/west/01.gz
2015-04-12/logs/west/02.gz
2015-04-12/summary
2015-04-13/logs/east/01.gz
2015-04-13/logs/east/02.gz
2015-04-13/logs/west/01.gz
2015-04-13/logs/west/02.gz
2015-04-13/summary
README""").split()
paths = [pathlib.Path(p) for p in paths]
Now I'll want to make some helper functions. First is a menu function that asks the user to select an entry from a list of choices. This will return an element of the list:
def menu(choices):
for i, choice in enumerate(choices, start=1):
message = '{}: {}'.format(i, choice)
print(message)
while True:
try:
selection = choices[int(input('? ')) - 1]
except (ValueError, IndexError):
message = 'Invalid selection: must be between 1 and {}.'
print(message.format(len(choices)))
else:
return selection
We'll need a list of choices to give to that function, so we'll make a path_choices function which does as much. We give this function a container of full paths and the current path the the user has selected. It then returns the "next steps" that the user can take. For example, if we have a list of possibilities: ['foo/apple', 'foo/banana/one.txt', 'foo/orange/pear/summary.txt'], and curpath is foo, then this function will return {'apple', 'banana/', 'orange/'}. Note that the directories have trailing slashes, which is nice.
def path_choices(possibilities, curpath):
choices = set()
for path in possibilities:
parts = path.relative_to(curpath).parts
root = parts[0]
if len(parts) > 1:
root += '/'
choices.add(root)
return choices
Lastly, we'll have a simple function to filter a container of paths, only returning paths which start with curpath, and which aren't in fact equal to curpath:
def filter_paths(possibilities, curpath):
for path in possibilities:
if path != curpath and str(path).startswith(str(curpath)):
yield path
After this, it's just a matter of gluing these functions together:
curpath = ''
possibilities = paths
while possibilities:
print('Current path: {}'.format(curpath))
choices = sorted(path_choices(possibilities, curpath))
selection = menu(choices)
if curpath:
curpath /= selection
else:
curpath = pathlib.Path(selection)
possibilities = list(filter_paths(possibilities, curpath))
print()
print('You have selected: ', curpath)

Parsing directories and detecting unexpected blanks

I'm trying to parse some directories and identifying folders witch do not have a specific correct pattern. Let's exemplify:
Correct: Level1\\Level2\\Level3\\Level4_ID\\Date\\Hour\\file.txt
Incorrect: Level1\\Level2\\Level3\\Level4\\Date\\Hour\\file.txt
Notice that the incorrect one does not have the _ID. My final desired goal is parse the data replacing the '\' for a delimiter to import for MS excel:
Level1;Level2;Level3;Level4;ID;Date;Hour;file.txt
Level1;Level2;Level3;Level4; ;Date;Hour;file.txt
I had successfully parsed all the correct data making this steps:
Let files be a list of my all directories
for i in arange(len(files)):
processed_str = files[i].replace(" ", "").replace("_", "\\")
processed_str = processed_str.split("\\")
My issue is detecting whether or not Level4 folder does have an ID after the underscore using the same script, since "files" contains both correct and incorrect directories.
The problem is that since the incorrect one does not have the ID, after performing split("\") I end up having the columns mixed without a blanck between Level4 and Date:
Level1;Level2;Level3;Level4;Date;Hour;file.txt
Thanks,

Do the "_ID" check after splitting the directories, that way you don't loose information. Assuming the directory names themselves don't contain escaped backslashes and that the ID field is always in level 4 (counting from 1), this should do it:
for i in arange(len(files)):
parts = files[i].split("\\")
if parts[3].endswith("_ID"):
parts.insert(4, parts[3][:-len("_ID")])
else:
parts.insert(4, " ")
final = ";".join(parts)

Exact match in strings in python

I am trying to find a sub-string in a string, but I am not achieving the results I want.
I have several strings that contains the direction to different directories:
'/Users/mymac/Desktop/test_python/result_files_Sample_8_11/logs',
'/Users/mymac/Desktop/test_python/result_files_Sample_8_1/logs',
'/Users/mymac/Desktop/test_python/result_files_Sample_8_9/logs'
Here is the part of my code here I am trying to find the exact match to the sub-string:
for name in sample_names:
if (dire.find(name)!=-1):
for files in os.walk(dire):
for file in files:
list_files=[]
list_files.append(file)
file_dict[name]=list_files
Everything works fine except that when it looks for Sample_8_1 in the string that contains the directory, the if condition also accepts the name Sample_8_11. How can I make it so that it makes an exact match to prevent from entering the same directory more than once?

You could try searching for sample_8_1/ (i.e., include the following slash). I guess given your code that would be dire.find(name+'/'). This just a quick and dirty approach.

Assuming that dire is populated with absolute path names
for name in sample_names:
if name in dire:
...
e.g.
samples = ['/home/msvalkon/work/tmp_1',
'/home/msvalkon/work/tmp_11']
dirs = ['/home/msvalkon/work/tmp_11']
for name in samples:
if name in dirs:
print "Entry %s matches" % name
Entry /home/msvalkon/work/tmp_11 matches

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.